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NOMENCLATURE 


A matrix of partial derivatives of the observations with respect 

to the state 

B matrix of partial derivatives of the observations with respect 

to the systematic errors 

E expectation operator 

G matrix relating w to x 

H,h noise-free observation 

I identity matrix 

L linear operator 

L(x) likelihood function 

N tracking normal matrix 

n data noise 

p probability density function 

magnitude of the station radius vector 
r data residual 

s systematic error 

T final time; duration of a tracking interval 

t time 

U state transition matrix 

u system control 

W weighting matrix 

w state noise 

x system state 

y data vector; the integral of the data vector 

z data vector 

a derivative of state noise; unknown acceleration 

T covariance matrix of the data noise 

5(s) Dirac delta function (6 (s) = 0 if s ^ 0, 5(0) = 1) 

e error 

A covariance matrix of x 

A Lagrange multiplier vector 

y e(x) 

p integral of the data vector; data residual 

Z covariance matrix 

a standard deviation 

t time interval 



Subscripts 


c 

I 

LS, WLS 

M 

MV 

o 

T 


Computed 
Ito calculus 
Weighted least squares 
Measured 

Minimum variance 

Initial 

Final 


Superscripts 


T 




Estimate 

Transpose 

d__ 

dt 



SUMMARY 


This report describes the state of the art in sequential processing 
techniques for trajectory estimation as seen by the authors. Some of the 
material presented is a tutorial interpretation and summary of well known 
results; other material is new . A major topic discussed is the effect of 
nonlinearities, with the aim of developing and justifying practical algori- 
thms for treating the problem. Since nonlinear, maximum likelihood esti- 
mation theory has been successfully used in orbit determination for some 
years, this approach has been emphasized over the more recently suggested 
nonlinear minimum variance technique. 

Section 1 is the Introduction, and outlines the technical problems 
to be discussed. Section 2 describes the minimum variance estimation 
technique and some of the approximations which have been suggested in order 
to make the algorithm practical. Section 3 discusses maximum likelihood 
estimation, control, and error analysis of the form presently employed for 
orbit determination. Section 4 derives the maximum likelihood estimator 
for unknown acceleration (state noise) , and develops a practical algorithm 
for solving the problem. Section 3 discusses some nonlinear and numerical 
properties of sequential orbit determination algorithms, and derives a 
theorem which is fundamental to some forms of sequential estimation. 

Section 6 analyzes estimation in the presence of a slight nonlinearity, and 
compares several reasonable estimators. Section 7 treats orbit parameters 
which, for some reason such as computer limitation, have not been included 
in the estimation process (systematic errors). Section 8 discusses the 
treatment of correlated data which does not have the usually assumed 
Markoff property. 

This report describes the work of six individuals. Sections 1 and 3 
were contributed by C. G. Pfeiffer; Head, Mathematical Physics Section, 
Guidance and Analysis Dept. , TRW Systems Group. Section 2 by R. E. 
Mortensen; Assistant Professor of Engineering, UCLA, and consultant, 

TRW Systems Group. Section 4 by C. G. Pfeiffer and D. D. Morrison; Staff 
Engineer, Analytical Research Operations, TRW Systems Group. Section 5 by 
D. D. Morrison. Section 6 by J. V. Breakwell; Professor of Engineering, 
Standford University, and consultant, TRW Systems Group. Section 7 by 


vii 



W. H. Berry; Member of the Technical Staff, Guidance and Analysis Dept., 

TRW Systems Group, and C. G. Pfeiffer. Section 8 by M. H. Merel; Member 
of the Technical Staff, Guidance and Analysis Dept. , TRW Systems Group. 
Attempting to maintain continuity between the rather diverse subjects 
treated, compilation and minor revision was done by C. G. Pfeiffer. 

Editing and typing was performed by D. A. Henderson. 

In order to avoid duplicate publication, not all of the analysis 
completed under this contract has been included here. Thus to complete the 
discussion on sequential processing techniques the reader should refer to 
’’Mathematical Problems of Modeling Stochastic Nonlinear Dynamic Systems” 
by R. E. Mortensen, issued as NASA Report CR-1168, and "On the Identification 
of Observable Orbit Parameters, with Application to Lunar Orbiter Tracking" 
by C. G. Pfeiffer, which is scheduled to appear in the January-February 
1969 issue of the Journal of the As tronautical Sciences. (The latter work 
was also partially supported by NASA contract NAS 9-4810, administered by 
Manned Spacecraft Center, Houston, Texas). In addition, two other papers 
not covered by this contract should be considered as part of this discuss- 
ion: "Maximum Likelihood Recursive Nonlinear Filtering" by R. E. Mortensen, 
published in Journal of Optimization Theory and Application s (Vol. 2, No. 6, 
1968, pp 386-394), and "Methods for Nonlinear Least Squares Problems and 
Convergence Proofs" by D. D. Morrison, published in the Jet Propulsion 
Laboratory Tracking and Orbit Determination Seminar Proceedings (February 
23-26, 1960, Pasadena, California). 



1. INTRODUCTION 


In the orbit determination problem we are given a nonlinear dynamic 
system of the form 

= f(x(t), u(t), t) + (^) o ^ t ^ t (l.i) 

where x is the state vector, consisting of position and velocity of the 

spacecraft plus any parameters which affect the acceleration of the vehicle 
dxj_ 

(in which case =0, x_^ = constant); u(t) is a control (guidance) to be 

applied, such as one or more acceleration impulses to be supplied by the 

dw 

spacecraft engine; and is state noise, such as random solar wind or 
leakage of the spacecraft attitude control system. We know that sometime 
during the flight we will be given a sequence of data points 

z (N) = [z(t Q ), z(t 1 ) ••• z(t N )] (1.2) 

and that this data will be used in some way in real time to predict 
(estimate) the state at the final time T, x(T), or at some earlier time. 

It is feasible to do this, for if (1.1) is thought of as a discrete equation, 
so that there are a finite number of state disturbances dw(t^) , then we can 
model the data as 


z(N) = H n (x(0), dw) + n(N) (1.3) 

where x(0) is the initial condition, dw is a vector of state disturbances 
up to t^, and, n(N) is data noise. Thus, via (1.1), (1.3) and the assumed 
apriori statistics of x(0) , dw, and n(N), a statistical relationship 
between z(N) and x(T) can be derived, and, given z(N), a meaningful estimate 
of x(T) can be obtained. If such an estimate is made at t^ < t^, based upon 
a partial set of data z (K) , and the estimate is later revised at t^ > t^, 
based upon the larger data set z(J), the estimation procedure is said to be 
sequential. Sequential estimation is almost always required for space 
missions, for the guidance corrections must be applied before aTi the data 
has been obtained. ^ 


The ultimate sequential estimation procedure is to revise the estimate 
of x(T) after each new data point z(t^) is obtained. It is in general neither 
necessary nor practical to do this, but, for the special case of linear 
systems with Gaussian, white disturbances n and dw, Kalman has shown [1],[2] 
an elegant way to do this. His work was generalized in [3] to obtain for 
an arbitrary stochastic process the necessary and sufficient conditions 
which must apply (i.e., the wide sense Markoff property) if the procedure 
is to yield the minimum variance estimate. It was then pointed out in [4] 
that, if the necessary and sufficient conditions apply, the stochastic 
process can always be modeled as a linear dynamic system with white Gaussian 
disturbances . 

Thus in the nonlinear case it is not at all clear how to do sequential 
estimation. Several approaches have been tried or suggested: 

(1) apply the linear Kalman filter, hoping that enough data will 
be obtained to cause the estimate to converge to a reasonable 
answer 

(2) apply a modified form of the linear Kalman filter which includes 
nonlinear correction terms 

(3) calculate the true minimum variance estimate, which is the first 
moment of the conditional probability density function of the 
state, given the data 

(4) derive some approximation to the true minimum variance estimate 

(5) calculate the true maximum likelihood estimate by iterating over 
all the available data to find the root of the differentiated 
likelihood function / 

(6) derive an approximation to the true maximum likelihood estimate 

Methods (1) and (2) sometimes work, but theoretical justification is lacking 
and counterexamples can be constructed where such an approach is poor. 

Method (3) is elegant and intuitively satisfying, but is at present not 
practical because a multidimensional, nonlinear partial differential equa- 
tion must be solved. Method (4) is another way of arriving at Method (2), 
and suffers the same limitations. Method (5) is presently used in orbit 
determination work, where a sequential estimate is needed at only a few times 
during the mission. This approach has been highly successful, but has the 
potential disadvantage for future applications that some time and computer 
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capability is required in order to find the estimate. Method (6) has 
received little attention, mainly because Method (5) has been available and 
the computational load and time lag have not been an important factor. 

In this report we discuss certain questions arising in nonlinear sequen- 
tial orbit determination. In so doing we will have in mind the following 
problem areas: 

(1) stochastic modeling - It is not at all clear what meaning can be 
given to a solution of the nonlinear differential equation (1.1) when state 
noise C~j^r) is present. This question, which requires that the analyst take 
a position on the Ito vs. Stratonovich calculus, has been treated under this 
contract and is discussed in [5]. We shall not repeat that work here; suffice 
it to say that the Stratonovich interpretation will be assumed unless it is 
stated otherwise . 

(2) minimum variance vs. maximum likelihood estimat ion - Either of 
these well known estimators could be employed, depending upon factors such 
as ease of mechanization for a particular problem and personal taste. These 
two forms will be discussed in Sections 2 and 3. 

(3) combined estimation and control - Although this report is primar- 
ily devoted to the estimation problem, it is well known that the combined 
estimation and control problem leads to some new and puzzling considerations. 
For example, the partial differential equation of Method (3) above should 

be modified so as to obtain the equivalent of the dynamic programming 
solution. The combined problem is discussed in Section 3 from the maximum 
likelihood point of view in an attempt to justify the presently employed 
procedure in orbit determination and guidance work. 

(4) the effect of nonlinearities - This question, which is central to 
this report, is discussed from various points of view in Sections 2-6. 

(5) numerical considerations - It is well known that even in linear 
systems numerical roundoff error can cause the estimate to diverge from 
the true value. This question is discussed in Section 5. 

(6) convergence properties - If one is willing to iterafce^oyer some - 

or all of the data in order to get an estimate, it needs to besfshown that 

the chosen algorithm will converge and that the converged answer will be 
unique. This question is discussed in Section 5 and in Reference [6], 



(7) selection of observable orbit parameters - Whatever algorithm is 
employed, it is clear that the number of orbit parameters (i.e., initial 
condition components and acceleration parameters) should be restricted to 
the minimum required to model the data, that is, only the "observable" 
parameters should be estimated. This question was treated under this 
contract, and is discussed in Reference [7]. 

(8) consideration of systematic errors - As a practical matter, it is 
usually true that some of the observable parameters are left out of the 
model because of, say, computer limitations. The treatment of these unes- 
timated parameters, which are called systematic errors, is discussed in 
Section 7. 

(9) treatment of correlated data - Derivations of orbit determination 
algorithms usually assume uncorrelated (white) data noise, supposing that 
correlated data could be easily incorporated as part of the model of the 
dynamic system. This is not necessarily true, especially if the data noise 
is not a first order Markoff process. This question is discussed in 
Section 7. 



2. MINIMUM VARIANCE ESTIMATION 


2.1 INTRODUCTION 

The various techniques of sequential stochastic estimation have their 

origins in the early least-squares differential correction schemes for orbit 

determination. One justification for these early schemes is the theory of 

maximum likelihood estimation to be discussed in Section 3. An account of 

the development of orbit determination methods is given in the paper by 

Mowery [8]. The nonlinear, minimum variance sequential estimator follows 

from the suggestion of Stratonovich [9] that the fundamental entity in 

sequential estimation is the conditional probability density function 

p(£,t|y r -,) for the current state (£) being estimated, given the record 

of the observational data (y r -, ) . In Sections 2.3 and 2.4 the partial 

^o * ^ 

differential equations for this function are discussed. The basic problem 
is to make inferences concerning the behavior of a certain Markov stochastic 
process, using only the information to be gleaned from a knowledge of a 
model for generating the process together with the past history of obser- 
vations of a related process. 

In other words, we are given a model which generates two related 
stochastic processes. The problem is to compute the conditional probability 
distribution of the current state of the first process, given the past his- 
tory of observations of the second process. S tratonovich [20] attacked this 
problem in accordance with the standard approach used in the study of 
Markov processes, namely, to find a partial differential equation for the 
transition probability density function. Stratonovich derived a nonlinear 
stochastic partial integro-dif ferential equation which he asserted is obeyed 
by the conditional probability density for the current state of the first 

process given the record of observations of the second process. Somewhat 

[ 10 ] 

later, the same equation was rederived by Kashyap 

[ 11 ] 

Still later, Kushner published a paper in which he claimed 

S tratonovich 1 s results were in error because of a failure to take into 
account certain second order terms. This issue arises because the observed 
process contains additive white noise. As a consequence, the partial diff- 
erential equation for the conditional density effectively contains a white 
noise forcing term. This means that certain mathematical pathologies arise 


[ 12 ] 

in dealing with this equation. In a revised version of his paper, 

Kushner clarified this issue somewhat and effectively recognized that the 

discrepancy between his results and those of Stratonovich is related to the 

divergence between the Ito and Stratonovich stochastic calculi^ \ In a 

[13] 

correspondence item, Bucy emphasized the importance of the Ito calculus 

and presented some results which are equivalent to some of those of Kushner. 

The subject of nonlinear filtering has been the topic of a number of 

[13] 

reports and papers (See References 14-27) since Bucy's note . The purpose 
of this Section is to present a tutorial discussion of the main results 
which have been obtained so far, attempting to resolve some questions which 
have arisen concerning these works, and to discuss some suggested approxi- 
mations intended to simplify the calculation of the nonlinear minimum var- 
iance estimate. 

2.2 THE MODEL AND THE PROBLEM 

In this Section, the problem and the main results will be stated first 
in Stratonovich form, in order to be as intelligible as possible to those 
not familiar with the Ito Calculus. Following that, the same material will 
be presented in Ito form. See Reference [5] for an explanation of the 
Stratonovich and the Ito stochastic calculus. For simplicity, only the case 
of scalar-valued random processes will be treated in detail, since the 
generalization to the vector-valued case is quite direct. 

Consider a plant described by a nonlinear differential equation with 
a white noise forcing term: 


:( t) = f (x(t) , t) + n (t) 


( 2 . 1 ) 


The state x(t) is not directly observable. Rather, in general one observes 
a nonlinear time-varying function of x(t) which is corrupted by more white 
noise: 


y ( t) = h (x ( t ) , t) + n 2 (t) 


( 2 . 2 ) 


At the initial time t , the initial state has a probability distribution 
which has a density function denoted as p^(£) . One knows the function 
P q ( 5) , and also one knows the history of y(x) over the interval t Q < t < t, 
Denote the record of this function over the whole interval as y^ 
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In Reference [5] y it is explained that 'the Markov stochastic process 
generated by equation (2.1) is characterized by its transition density 
function p(£,t|n,s). This density function is obtained as a solution to 
the Fokker-Planck-Kolmogorov partial differential equation. 

The problem is to find out how this density function is modified when 

one conditions only on the information contained in the record of observations 

y r , rather than on any exact knowledge of the state at some time prior 
L o* J 
to t. 

Define the conditional density function p(£,t|y f ,) by 

L V tJ 


The problem, then, is to compute this function. 

2.3 THE PARTIAL DIFFERENTIAL EQUATION FOR THE CONDITIONAL DENSITY 

We assume explicitly that the noise terms and in (2.1) and (2.2) 
are gauss ian and independent, and that 

E jn l( t)j = E jn 2 (t)J = 0 ; E jn^On^x) ) = 0 Vt, x; 

E |n 1 (t)n 1 (x)| = 6 (t-x) ; E jn 2 (t)n 2 (x)j = 6(t-x) (2.4) 

Let p(E, t|u,t ) be the transition function for the Markov process 
generated by (2.1), and let p q be the probability density for the initial 
state. For convenience, define the function q(£,t) by 


]!' 


(2.3) 


p(S»t|yr t = Pr j^ 4 x (t) < ? + dC |y 


q(€» t) 


/ 

—oo 


p(C,t|n,t a ) p o (n) dn 


(2.5) 


Let E^ denote mathematical expectation taken over the sp J ace_of sample 

paths of the x(t) process alone, i.e., E^ denotes averaging over only the 

ensemble of realizations of solutions of (2.1). Let 6(x f ,, y r -) 

Lt o ,tJ Lt o ,tJ 


JSSBL ■ 
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denote the quantity 




!/[ 


6 (x r i , y r i) = exp { / | h(x(T),i)y(i) - ^ h 2 (x(t),t)| dx } (2.6) 


,T> ] 


Let r(?,y 


I'o’ 1 ] 


, t) denote the quantity 


r<5 * y [ t o* t ]’ C) ' E J 8(x [to.«]’ y [%.t] ) | 


x(t) = dq(C,t) 


(2.7) 


It is shown in References [14] and [19] that the sought-for quantity in 
(2.3) may be written 



So far, all the equations we have written have the correct form 
regardless of whether the Ito or the Stratonovich calculus is to be used. 
However, from this point on it makes a difference whether the integral in 
the exponent in (2.6) is interpreted as an Ito integral or a Stratonovich 
integral. We will proceed from this point following the Stratonovich 
approach. 


It is shown in Ref erence [19] that r(£,y r -.>t) satisfies the forward 

L V J 

equation 


3r 

3t 


d r [f (C*t)rj + 2 


35 2 


h(C,t)y(t) - \ h 2 (£,t) 


(2.9) 


Now define 


*(t) 


oo 



( 2 . 10 ) 


- 8 - 


Equation (2.8) becomes 


p<5 ’ t|y [^.f 


= ^ 1 (t) r(C,y r i , t) 


[V 1 ] 


( 2 . 11 ) 


Employing the usual rules of calculus , one has 


^ 1 ( t) -|~ - ip 2 (t) ifi(t) r 


( 2 . 12 ) 


After some manipulation and the use of (2.8) - (2.12), one arrives at 

the following forward equation for p(£,t|y r ,): 

l o ,tJ 

= - ^[f(5.t) P ]+ \ + j^h( ? ,t)y(t) - \ h 2 (£,t)J P 


" P 


OO 

/[h(.t)y(t) - |-h 2 (5,t) p(5 »t| y fc j 


)d5 (2.13) 


In general, the solution of the nonlinear filtering problem requires 
that (2.13) be solved in real time, as the data y(t) is received. The 
boundary conditions on (2.13) are 


lim p(€,t|yr i) 

C ^o l J 


= Po (?) 


(2.14) 


where P Q (0 is the apriori density for the initial state at time t^, and 


,11m p(?»t|y rt t l) 

kl — [o^J 


= 0 


(2.15) 


The minimum mean-squared error Bayes estimate of the current state 
is given by 


x ( t) = E x 


CO 

<t,ly [Vt]| '/ 5 P<S>t|y [ t o- t ] 


) dg 


(2.16) 


§ 
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In general, it is not possible to find an ordinary differential equation 
obeyed by x(t) and forced by y(t) which can be solved directly without first 
finding P(5,t|y^ t because direct differentiation of (2.16) leads to an 

ordinary differential equation for x(t) which involves the unknown higher 

moments of p(?,t|y, t ,). In the special case when both f(£,t) and h(5,t) 

L o’ J 

are linear in 5, then p(£>t|y^ ^j) will be gaussian. In this case, one 

may assume 


pU>t 




f - * (t 3 2 ) 

2o 2 (t) ) 


(2.17) 


Substitution of (2.17) into (2.13) and matching coefficients of 
powers of (£-x) leads eventually to the well-known Kalman-Bucy filter equa- 

A 2 ^ 

tions for x(t) and o (t). The same approach applies when x(t) is a vector 
2 

and a (t) is a matrix. See Reference [15] for a typical derivation of the 
Kalman filter using this approach. In carrying out this approach, of course, 
the differentiation of (2.17) with respect to t and substitution in (2.13) 
is all done according to the rules of ordinary calculus, i.e., x(t) is 
treated just as if it were a deterministic function of t rather than a ran- 
dom process. 

2.4 THE ITO FORMULATION 

In order to recast these results in Ito form, it is necessary to begin 
by rewriting equation (2.3). Let 


w 2 (t) = f n 2 (x)dT 5 z (t) = y* y(T) dT , (2.18) 


and rewrite (2.2) as 


dz(t) = h(x(t) , t) dt + dw 2 (t) . 


(2.19) 


The segment 



t] 


contains exactly the same information as 



t]’ 


so now 
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one conditions on z ^ ^ rather than y^ ^y In place of the quantity 

defined in (2,3), we write p (g,t|z r . -.) . The subscript I stands for Ito. 

1 L V tJ 

The quantity defined in equation (2.6) is now replaced by 



( 2 . 20 ) 


In (2.20), the first integral in the exponent is to be interpreted as an Ito 
integral. 

In analogy to (2.7), define 


= E x J e i (x jt o ,t]’ z [t o ,t] > I x(t) = (2.2D 


In analogy to (2.8), we now have 





r I (! - z [ 

v 1 ! 

,0 

CO 

/ 

— oo 


j ,t)d5 


( 2 . 22 ) 


It is shown in Reference [19] that by applying the rules of Ito calculus 
to (2.20) and (2.21), it is possible to obtain the following forward equation 
for r I « >Z[t jt] ,t): 


d t r I 




dt + h(C,t)r dz ( t) 


(2.23) 


It should be noted that equation (2.23) is not obtained from equation (2.9) 
just by making the formal change of variables indicated by (2.18) and (2.19). 
The two equations are different because the partial differential with respect 
to t in (2.23) is an Ito differential, whereas the partial derivative with 
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respect to t in (2.9) is understood in the Stratonovich sense. Now define 


VjCt) = 


00 


,t)d£ 


.(2.24) 


Equation (2.22) becomes 


= (t) r (?,z 


[*••*] ,c> 


(2.25) 


Employing the Ito stochastic differential rule along with (2.24) and (2.25), 


one obtains 

d t p i = ^i 1 d t r ‘ h 2 r i d t h + 


/« t . 


t) rd£i 


-2 

hr 


.oo 

/ hu - 


t) r d£ 


dt 


(2.26) 


Define 


(2.27) 


oo 

h ( t) = J * h(£,t) P x (5,t|z fc )d? = ip 1 (t) J' h(5,t) r(C,z r ,t)d5 

-oo ° -oo t o J 

Equation (2.26) may be rewritten 

d t P I = d t r ~ r i d t^I + “ h(5,t)J dt (2.28) 

Equation (2.28) should be compared with the corresponding Stratonovich form, 
which is equation (2.12). After some manipulation and the use of (2.23) - 
(2.25) and (2.28), one arrives at the Ito form of the forward equation for 

p i (5 ’ t h t , t ] ): 


a 2 

d t P I { ~ H [ f(5 ’ t)p l] + 2 a? 2 | dt 


r^Sfe. 


m . 

■ ■ "V- 


+ £h(f[,t) - h ( ti) ] P x ^ dz ( t) - h(t) dt j (2.29) 


/% 

i. ,* "i 

H 
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i 


Again in the special case when both f(£,t) and h(g,t) are linear in 
in exact analogy to (2.17) one may assume 


PjU.tl z 


[*.•'] > 


- [ 


2tto 2 (t 


r 1/2 

)J exp ( 


_ k - x(t) 


2a 2 (t) 


(2.30) 


Substitution of (2.30) into (2.29) and matching coefficients of powers 

of (£-x) will again lead to exactly the same well-known Kalman-Bucy filter 

2 

equations for x(t) and a (t) , provided one uses the Ito calculus for comput- 
ing the stochastic differential with respect to t. This is a point which 

can lead to controversy (for example, see Reference [15]). For examples of 

2 

such a derivation in the case where x is a vector and o is a matrix, see 
References [24], [25], and [26]. The extension of this derivation to the non- 
linear case is also discussed in these references. 


2.5 THE ONE-DIMENSIONAL LINEAR PROBLEM 

In the linear time- invariant case, equations (2.1) and (2.2) become 

x (t) = a x (t) + n^(t) (2.31) 


y (t) = c x (t) + n 2 (t) 


(2.32) 


where a and c are constants. 

The Stratonovich form of the partial differential equation for the 
conditional density is equation (2.13). In the present case, this equation 
becomes 

§£ = - |f (a Sp) + | jfS + Ec5y(t) - I c 2 ? 2 ] P 

00 

- p J Cc?y(t) - \ c 2 l 2 l pdC (2.33) 

—00 

As suggested in Section 2.3, we assume a solution of the form of 
equation (2.17), and substitute it into the partial differential equation 

^ 9 

in order to obtain ordinary differential equations for x(t) and a (t). 

When this is done, the resulting equations are 
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(2.34) 


= ax(t) + ca 2 (t) [y(t) - c*(t)] 

^ [a 2 (t)] = 1 + 2a0 2 (t) - c 2 a\t) 

A boundary condition must be imposed on the solution of the partial diff- 
erential equation, as given in equation (2.14), In the present case, let 
us take 

P 0 (S) = 6(? - x c ) < 2 - 36 > 

i.e., the initial state x(t ) = x is non-random and known exactly. 

o o J 

In order to satisfy this boundary condition, we must apply appropriate 

corresponding initial conditions to equations (2.34) and (2.35) above. The 

correct initial conditions are 


i<t,) = X Q (2.37) 

a 2 (t ). 0 (2 - 38 > 

o 

Equations (2.34) and (2.35), subject to (2.37) and (2.38), may be solved by 
standard methods. The solutions are found to be 


2, . _ sinh X (t - t n ) 

' ' X cosh X(t - - a sinh X(t - t Q ) 

Xx + c I sinh \(t - t ) y (t)di 

x(t) = ° t ° 

x 7 o 

X cosh X(t - t ) - a sinh \(t - t ) 
o 7 v o 7 

where 

x, /7T7 


(2.39) 


(2.40) 


(2.41) 


Equation (2.40) makes it clear that a sufficient statistic on the past 
history of the data is the linear functional 



sinh X(t 


t Q ) y( T )<* T 
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2.6 A ONE-DIMENSIONAL NON-LINEAR EXAMPLE 


In order to understand the nonlinear case, it is convenient to have 
at hand a nonlinear problem for which the exact solution is known. Using 
the preceding results, we can manufacture such a problem. Our idea is to 
construct a nonlinear problem which is actually equivalent to the preceding 
linear one by making a suitable nonlinear transformation of the state vari- 
able in both the plant and the observation equations. 

Consider the nonlinear change of variables 

v(t) = sinh [x(t)] (2.42) 

This particular nonlinearity has already been discussed in pp. 14-20 of 
Reference [5], so we expect that its introduction is going to cause a diver- 
gence between the Stratonovich and I to forms of the nonlinear filtering 
equations. Although this makes the problem more subtle, no actual paradoxes 
will arise so long as we remember the content of Reference [5], 

Let us continue using the Stratonovich form of the equations. The 
plant equation (2.1) now becomes 

“ a ^1 + v 2 (t) sinh" 1 Lv(t)] 

+ ^1 + v 2 (t) (t) (2.43) 

The observation equation (2.2) now becomes 

y ( t) = c sinh 1 [v(t)] + n 2 (t) (2.44) 

Thus, equations (2.43) and (2.44) along with the nonlinear change of variables 

x(t) = sinh 1 [v(t)] (2.45) 

are completely equivalent to equations (2.31) and (2.32). However, we can 
pretend that we were given only eqs. (2.43) and (2.44), and asked to find 
the optimal nonlinear filter for this model. 

Let us write down the exact answer. Since the minimum variance Bayes 
estimate is the conditional mean, we have 
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v(t) = E {v(t) I y [t ^ t] } 

= E {sinh [x (t )] j ^ 

00 

= / sinh § P(C,t I y|- t t j) d? 

- ^ ~ x(t )] 2 

= [2n a 2 (t)] "^' 2 y sinh ? e ia^Tt) ^ (2.46) 

-00 

2 

In (2.46), a (t) and x(t) are of course still given by (2.39) and (2.40) 
respectively. 

Making use of a known result, we have at once from (2.46) that 


[ 2ji o 2 


- 1/2 r » 

(t)] / sinh § e 

— CO 


Cg - x(t )] 2 

2o 2 (t) 


d§ 


1/2 a^(t) 

e sinh x(t) 


(2.47) 


For emphasis, let us review what we have done up to this point. We 
know the optimal estimate x(t) for the linear filtering problem defined by 
(2.31) and (2.32). We make the nonlinear change of variables (2.42). By 
(2.46) and (2.47), we therefore find that the optimal estimate v(t) is 

v(t ) = e 1//2 ° ^ sinh Cx(t)] (2.48) 

This is, therefore, the exact solution to the nonlinear filtering problem 
defined by (2.43) and (2.44). 

2.7 THE FILTER EQUATIONS 

We now wish to explore whether this optimal estimate v(t) can be 
obtained as the solution to some nonlinear filtering equations analogous to 
the linear filtering equations (2.34) and (2.35) for x(t). In the present 
case because of the fact that the quantity 
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sinh \(t - t ) y(T)dt 


o 

is a sufficient statistic for computing x(t), it is also a sufficient stat“ 
istic for computing v(t). Consequently, in the present case there will 
exist some exact nonlinear filtering differential equations whose solution 
is v ( t) . 


This is to be contrasted with the general nonlinear filtering problem 
in which there does not exist a sufficient statistic, and consequently 
there are no exact (finite dimensional) nonlinear filtering equations. In 
the general case, the only exact equation would be the partial differential 
equation for the conditional probability density itself. 

The most direct way to find the exact equations for v(t) is simply to 
differentiate (2.48) directly, making use of (2.34) and (2.35). In doing 
this differentiation, we must be careful to recall the discussion in 
Reference [5]. If we differentiate (2.48) according to the rules of the 
Ito stochastic calculus, then we will get an Ito stochastic differential 
equation which must be solved by the rules of the Ito calculus in order to 
get the right answer. On the other hand, if we differentiate (2.48) 
according to the rules of ordinary calculus, the resulting differential 
equation will be in Stratonovich form and must be solved accordingly. 

We emphasize that we really are free to follow either route at this 

point. The correct form for the conditional density for x(t) is gaussian, 

2 

with variance a (t) and mean x(t) , as given by (2.39) and (2.40) respect- 
ively. This gaussian conditional density is simultaneously the Stratonovich 
solution of equation (2.13) and the Ito solution of equation (2.29), keeping 
in mind that, in the present case. 


f(€,t) = a? ; h(?,t) = c£ 


(2.49) 


Furthermore, in the present case the Ito and the Stratonovich forms of the 

- 2 

equations obeyed by x(t) and a (t) coincide; either way the correct equations 
are (2.34) and (2.35). Strictly speaking, of course, the change of notation 
represented by equations (2.18) and (2.19) should be made in order to cast 
equation (2.34) into the proper Ito form, but this does not alter the truth 
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of the statements we have just made. 


Let us continue, using the Stratonovich calculus for purposes of 
exposition. 


Differentiating equation (2.48) by the rules of ordinary calculus, 
we have 

/V — 0^ ( t) 

dv( t) 2 K J , r * s . \ i d r l 2,. v , 

— fa— = e smh [x(t>] — [■ j a ( t) ] 


1 2/ x 

+ e cosh [x(t)] ^ ^ 


(2.50) 


Substitution of (2.34) and (2.35) into (2.50), simplification, and use of 
the inverse of (2.48), namely 


yields finally 


dv(t) 

dt 



1 2 . . 

- 


-1 

- j o (t) 



= sinh 

e 

v(t) 



f 

1 

2 . 1 

-2 ■ 

+ v (t) < 

1 

( . , -1 
^ a sinh 

2 

e 

Q 

1 

\ 

r+ 

< > 
rt 

^ 'w' 


2 1 -1 
+ ca (t)ly(t) - c sinh 


1 2 , . 

~ 2 ° (t) ~ 
e v(t) 


(2.51) 


+ j v(t) [1 + 2a a 2 ( t) - c 2 a 4 (t)] 


(2.52) 


Equation (2.52) along with equation (2.35) thus constitutes the 
Stratonovich form of the optimal nonlinear filter for the nonlinear filter- 
ing problem represented by the Stratonovich eqs. (2.43) and (2.44). 

2.8 APPROXIMATIONS OF THE ESTIMATION EQUATIONS 

The solution of the general minimum variance nonlinear estimation 
problem is very difficult to mechanize. Consequently, it appears necessary 
to search for sub-optimal estimators which are easier to compute, and which 
still possess near-minimum estimation error. Since the exact minimum vari- 
ance sequential estimator is known for the linear problem with gaussian nois 
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namely the Kalman-Bucy filter, one approach to sub-optimal estimation is to 
attempt to find correction terms for the Kalman-Bucy filter for the case of 
slight non-linearity. The remainder of Section 2 will be devoted to a 
discussion of this approach to sub-optimal filtering. 

The literature on this problem may be classified in several different 
ways, according to the basic approach adopted for the estimation procedure 
and the underlying assumptions about the model. The approach to the 
problem may be either statistical or non-s tatis tical . In statistical pro- 
cedures, the existence of certain underlying probability distributions must 
be postulated, and statistical-type estimates such as maximum likelihood or 
minimum-variance Bayes are sought. In non-s tatis tical procedures, the 
problem is viewed as one of choosing an estimate which achieves the optimum 
curve fit to the actual data according to some criterion such as least- 
squares or Chebyshev. The underlying model may be either static or dynamic. 
For dynamic systems, the state space may be either continuous or discrete, 
and time may be taken either as a continuous variable or as a discrete 
variable. 

The compilation of a bibliography which is exhaustive in all of these 
categories would be an enormous task, and consequently will not be attempted 
here. A few isolated references to the literature will be given to illustrate 
some of the categories. Most of the cited references themselves contain 
fairly extensive bibliographies, so that the reader interested in a parti- 
cular area should at least be able to find a lead. 

It should be pointed out that much more of the published literature is 
devoted to theoretical formulations than to actual numerical solutions of 
particular problems. Consequently, it is not really possible to make a 
meaningful comparison among the various possible approaches and models which 
might be applied in a given situation. Therefore, there are only rather 
vague guidelines available to aid the uninitiated engineer in choosing a 
method of attack for his particular problem. Adding to the bewilderment 
is the fact that established workers in the field tend to have developed 
strong preferences for a particular approach, to the exclusion of all others, 
which they can justify only on a subjective basis. 

Any book on statistical estimation theory should serve for an initial 



orientation for static problems. One starting point might be the chapter 
by Balakrishnan in Reference [28] . 

References [8] and [6] treat the subject as a least squares curve 
fitting problem in discrete time. Section 6 is an interesting discussion 
of one aspect of the subject from this same standpoint. 

Reference [29] treats the subject as a problem of computing 
maximum likelihood estimates in continuous time, although it is also 
possible to interpret the approach taken there as continuous time least 
squares curve fitting. 

Ref erence [21] discusses a problem of continuous-time minimum-variance 
Bayes estimation of the state of a system with discrete state space. This 
reference shows particularly clearly how the discrepancy between the Ito 
and Stratonovich calculi appears in continuous-time stochastic estimation 
problems, a difficulty which does not arise in discrete- time problems. 

Finally, References [22 ] -[23] provide good starting points for investi- 
gating the problem of finding minimum- variance Bayes estimates of the state 
of continuous-time nonlinear dynamic systems with continuous state spaces. 

2.9 EXAMPLES OF SIMPLIFYING APPROXIMATIONS 

A close study of the simple example discussed in Section 2.7 should 
enable us to resolve some questions about the validity of various plausible 
approximations, at least in this special case. 

The crudest approximation is complete linearization. This means we 
expand all nonlinearities in the model, eqs . (2.43) and (2.44), in Taylor 
series, and retain only first order terms. The effect of this is to yield 
approximate equations for the model which look exactly like the original 
linear equations (2.31) and (2.32), with x(t) replaced by v(t) . Naturally, 
the exact Kalman filter for this approximate linearized model turns out to 
be given by equations (2.34) and (2.35) with x(t) replaced by v(t) . 

This approximation is tantamount to making the approximation 

sinh [x(t)] « x ( t ) (2.53) 

in (2.42) and so naturally we come out with the approximate estimate 
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v(t) « x(t) 


(2.54) 


It should be pointed out that the above procedure is not equivalent 
to first deriving the exact nonlinear filter (2.52) and then linearizing 
it . This can be seen by comparing eqs . (2.42) and (2.48). Whereas the 
linearization of (2.42) yields (2.54), the linearization of (2.48) yields 


v(t) 


cr 2 ( t) 


x(t) 


(2.55) 


Consequently, linearization of (2.52) must lead to (2.55) rather than (2.54). 

Another approximation frequently suggested is the so-called maximum 
likelihood filter (e.g., see Reference [29]). Although this filter can be 
obtained through a formal derivation from basic assumptions, it amounts to 
just retaining the form of the Kalman filter, eqs. (2.34) and (2.35), but 
replacing the linear terms in eq. (2.34) by the corresponding model non- 
linearities from (2.43) and (2.44). In the present case, the result is of 
the form 


- a \/ 1 + V 2 (t) sinh 1 [v(t)] 
at 

+ ca 2 (t) | y ( t ) - c sinh 1 [v(t)]j (2.56) 

Although this resembles (2.52) somewhat, it is simpler. 

Incidentally, it should be pointed out that according to maximum like- 
lihood estimation theory, if it is known that x(t) is the maximum likelihood 
estimate of x(t) , and if it is known that equation (2.42) connects v(t) and 
x(t) , then necessarily the maximum likelihood estimate of v(t) is given by 

v(t) = sinh [x(t) ] (2.57) 

Thus, we may compare equations (2.48) and (2.57) to see the theoretical 
difference between the minimum variance Bayes estimate and the maximum 
likelihood estimate. 

This comparison is meaningful in the present case only because of the 
fact that, for the linear, gaussian problem to which the Kalman filter applies 
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exactly, the minimum variance and maximum likelihood estimates coincide. 

Thus, the x(t) given by (2.40) is simultaneously the minimum variance and 
maximum likelihood estimate of the x(t) generated by (2.31). 

Incidentally, equation (2.56) should not be taken too literally. It 
is only intended to indicate the form of the filter equations which results 
when the procedure described in Reference [29] is applied to the nonlinear 
system (2.43). However, this procedure encounters difficulties which do 
not occur in Ref erence [29 ] , because of the coefficient >/ 1 + v 2 (t) multi- 
plying n^(t) in (2.43). As pointed out in Reference [5], the presence of 
a state dependent coefficient multiplying white noise immediately leads to 
the S tratonovich-Ito divergence. As a matter of fact, the correct form of 
the likelihood functional, given by eq. (5) of Reference [29], is no 
longer clear when there are state dependent nonlinearities. Therefore we 
should say only that the procedure described in Reference [29], when extra- 
polated in a plausible way, appears to lead to a filter equation having 
the form represented by eq. (2.56). 

2 

For the same reason, the coefficient a (t) in equation (2.56) is 

probably not the same function given by eq. (2.39) and obtained as a 

solution of eq. (2.35). The procedure described in Reference [29] can be 

made to lead to a differential equation of Ricatti type for this coefficient, 

- 2 

but it will contain terms which depend on v(t), thus coupling the a (t) 

/ 2 

and v(t) equations, because of the presence of the factor vl + v (t) 
multiplying n^(t) in (2.43), as just mentioned. 

For reasons given earlier, it seems correct to believe that the 
v(t) given by (2.57) is the correct maximum likelihood estimate of v(t) . 
However, direct differentiation of (2.57) and use of (2.34) does not yield 
(2.56). Again, there is a discrepancy caused by the factor yj 1 + v^(t). 

A detailed investigation of the nonlinear filtering problem when the 
noise enters with a state-dependent coefficient appears difficult. Such an 
investigation was felt to exceed the scope of the present effort. Conse- 
quently, we content ourselves here with pointing out some of the questions 
which arise in such a case. 
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3. MAXIMUM LIKELIHOOD ESTIMATION, CONTROL AND ERROR ANALYSIS 
3 . 1 INTRODUCTION 

Sequential orbit determination must be carried out in real time for 
space missions so that control (guidance) corrections can be applied. The 
method presently employed is straightforward. A model of the motion of the 
spacecraft is constructed, and probability density functions describing the 
random behavior of initial conditions, trajectory disturbances and data 
noise are postulated. Then given a data record at time t^, consisting of 
all data obtained up to estimates of the initial conditions and dis- 

turbances are calculated by applying an iterative Newton-Raphson technique 
to find the root of the differentiated likelihood function. The control 
(guidance) is then determined by treating these estimates as though they 
were the true values for a deterministic system, and error analysis is per- 
formed by linearizing the equations of motion about the estimated trajectory. 
This intuitively reasonable procedure has been used with much success. It 
is the purpose of this Section to discuss the theoretical justification for 
the presently employed estimation, control, and error analysis algorithms. 

The results will not apply to the most general form of the stochastic con- 
trol problem, and an exception will be discussed in Section 3.2. 

It should be noted that a sequential estimation procedure of this type 
does not permit the control to be applied immediately after receiving the 
last data point, because some time is required in order to process the 
total data record. This consideration has thus far not been important in 
orbit determination and guidance problems, where efficient numerical tech- 
niques have been developed. Considering the capability of present-day 
computers, and the many simplifications which can be introduced if one is 
willing to accept some approximation error, there is no reason to believe 
that the computational time lag will be a limiting factor for future missions. 
In any case, this is a practical problem which can only be discussed by 
referring to a specific application, recognizing that the alternatives to 
the maximum likelihood algorithm are the numerical solution of a nonlinear 
partial differential equation, or some approximation of that solution, or 
dynamic programming. 
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3.2 OPEN LOOP MAXIMUM LIKELIHOOD CONTROL 


The combined estimation and control problem might be best understood 
by first considering a relatively simple example of open-loop maximum like- 
lihood control, where the control correction is based upon only apriori 
information rather than data gathered during the mission. 

Suppose that at time t Q a single correction impulse vector u is to be 
applied so as to cause the "most likely" state at the final time T to be a 
desired value. Let the equations of motion be 


dx 

dt 


f(x, t) 


t < t ^ T (3.1) 

o 


where x is the state vector. The initial condition at t 


(+) 


XS X 


where 


x is a Gaussian vector with covariance matrix A and mean equal to 
o o 

[y + Ku] . The K is supposed to be a given matrix. Thus the control u 


impulsively changes the state at t^ according to 


x 

o 



) 



) + Ku 


(3.2) 


where x(t^ is a Gaussian vector with mean equal to y and covariance . 
Assuming a one-to-one mapping of x q to the final state x^ = x(T) of the 
form x T * g(x Q ), the probability density function of x T is 


P T (x T ) = c exp j- j [q(x T )] T A q 1 [q(x T )] 



(3.3) 


where c is the coefficient of the Gaussian density function of x q , and 


-1 


q(x T ) = g (x T ) - (Ku + p) 


(3.4) 


3x q 3g 1 (x T ) 

— \ = determinant |— ^ 


(3.5) 


Define the likelihood function 


L(x t ) = - In p T (x T ) = j q(x T ) T A q 1 q(x T ) 
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- In 


8x 

c 


- In c (3.6) 



8x o 9x T 

But |- — | is the inverse determinant of the state transition matrix |r — | , 

dx T o 


and it can be shown that (see [30], pp. 28) 


= exp <- 


j-/ trace [•— (t; x o (x T ))]dt| 


(3.7) 


Then the most likely value of x^ is that value x^ which satisfies 


= 0 = A o _1 q(x T ) + f trace [f|] 


(3.8) 


But for hamiltonian systems where the state transition matrix is symplectic 

• @Xrri - 9 f 

(see [31], pp. 306) it can be shown that |- — L | = 1 and hence trace [— ] = 0. 

oX aX 

O 

In this case we have q(x T ) = 0, and 


Ku = g (x T ) - y 


(3.9) 


In other words, for any desired final state x T we can find the control u 
from (3.9), just as though the initial conditions were not random, (i.e., 
hamiltonian systems can be treated as though they were deterministic) 
Although this property is also assumed in present orbit determination work, 
where data is processed and an estimate of X q is obtained before the control 
is applied, it is not at all clear that such a procedure is theoretically 
justified. One purpose of the discussion to follow is to show that under 
certain conditions, such a separation of the estimation and control problems 
is indeed legitimate. 


Clearly the simple result (3.9) does not apply when trace [— ] ^ 0, 
in which case a statistical "bias" must be introduced. This should not 
be an unexpected conclusion, for a similar result applies if one sought to 
control the expected value E[x T ]. For example, in the simple case 
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dx I , 

dT ’ “ ' f i 


dx_ 

— - = 0 = f 

dt " 2 


with x 2 (0) = x 1 2 (0) 

x 1 (T) = aT + x 1 (0) 
x 2 (T) = x|(0) 

and 

E(x 2 (T)) = o 2 


where a ^ is the apriori variance of x^(0). Note that we have 

trace (Jf) = trace [J J] =0 
2 

and even so the statistical bias occurs. One should not apply the 

maximum likelihood algorithm in this case, however, for the assumption of 
a one-to-one mapping between x^ and x^ is violated. 

3.3 A MAXIMUM LIKELIHOOD ESTIMATION AND CONTROL TECHNIQUE 

In this Section we shall formulate a general maximum likelihood esti- 
mation and control technique which leads to a algorithm which is consistent 
with present practice in orbit determination and guidance. Consider a 
discrete, nonlinear dynamic system described by the vector differential . 
equation 


dx(i) = f (x(i) , u ± , t ± ) dt + G(t ± ) dw ± 


0 < t < T (3.10) 
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where is one of a closely spaced set of discrete times {0, t^, t^ ••• 
t^, •• t^ t^ = T), x(i) is the dynamic state at time t^, dx(i) = x(i-KL)- 
x(i), u_^ is a constant control applied in the interval dt, and dw^ is the 
state disturbance applied in the interval dt.f To sidestep mathematical 
difficulties we shall assume that dx(i) is defined by (3.10), where dw(i) 
is state noise in the Stratonovich sense. At each time t. data will be 
obtained which is of the form 

z. = h(x(i), t.) + n. (3.11) 

1X1 

where n_^ is data noise. For simplicity we shall assume z^ is a scalar. We 
assume that the control u^ applied at each time t^ is a function of the 
entire data record up to t^, where this function is supposed to be specified 
apriori. Defining the data vector 

z T (i) = [z o , z 1 , z 2 •• z 1 _ 1 , z ± ] (3.12) 

we have 

u T (z) = [u o (z(o)>, Ul (z(l)), u 2 (z(2)) u^(z(N) ) ] (3.13) 

Let the total state of the system be 
T 

y Q = [x Q , dw Q , dw x , dw 2 , • • • dw^ ] (3.14) 

Since x(i) depends upon y^ and {u^, •• u^_^}, dt follows that, for any 

specified function u(z) , the u and z are implicitly complicated functions 
T 

of y^ and n = [n^, n^, • *n^] . In particular, if z(i) is written in the 
form 

z (i) = u ( i " 1 )] + n (i) (3.13) 

where H. is a vector function and 

i 


tThe state may include certain constant "acceleration" parameters x, , 
in which case dx^. = 0. 
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( 3 . 16 ) 


T 

n (i) = [n Q , n^, n 2 , n ± ] 

T 

u (1-1) = [u Q , u 1# u 2 , 

then the partial derivatives of z(i) with respect to y Q are obtained by the 
recursive relation 


d z ( i ) *} faH.l 

r sH i i 

rsu(i-i)i 

f"3z(i-l)1 

L^rJ * Ki ' 

Lsff i-dJ 




3z(o) 1 

‘3H " 



_ Sy o J ' 

lK. 



( 3 . 17 ) 


( 3 . 18 ) 


Similarly, 


raz(i) 

- j 

3u( i-l) 

T 3z(i-l)l 

dn(i) 

3n. 

Lid 

_SuCi-l)J 

3z(i-l) 

[**± J 

bn . 

L l d 


where n_^ is one component of n, and 


( 3 . 19 ) 


Sz(o) 

- 3n i 


\ 1 if i = 0 

( 0 if i / 0 


From these expressions 


du(l) ~ 

- 9y o J 


and 


L 8n i J 


can be calculated. 


Suppose that an apriori probability density function of y Q and the 

total noise vector n = n(N) is available, denoted by P o (y Q , n) . Let the 

final state be y^, = [x(T), dw^, dw^] , and assume that there is a 

one-to-one mapping of (y^, n ) to (y^, z) , where z is the total data vector 

z(N). Then 

y T = g(y Q » u) 

where u is the total vector u(N), which is a function of y^ and n. 
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Applying the inverse relationships, the apriori probability density function 
of the data and final state is 


p T (y T ,z) - p 0 (y 0 (y T , *>, »(*,, *)) | 

where | . . | is the determinant of the inverse Jacobian matrix 



(! 

if 

-( 

!h_\ 

?yj W\ 3z / 


(3.20) 

1 /lz \/is VViz\ 

L W W 0 /\ 9 y 0 / \ 3r V 

M ' 1 

\dn J 

I + 




where I is the identity matrix. Then 


My Q> n) 


9g 

-1 

bz 

9(y T y z) 






But from (3.19) 




1 0 0 0 0 . 

x 1 0 0 0 . 

x x 1 0 0 

x x x 1 0 . 


(3.21) 


(3.22) 


Since (if) (if) (if) can be 

zeros on the main diagonal. 


shown to be a lower triangular matrix with 

Then |-|^| = 1 and (3.21) becomes |-|^ | 

L y o 
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This is the inverse of the determinant of the well known deterministic state 
transition matrix (including the dw variables) , evaluated for a given 

y 6 (y T * z ) and u(z). 


Suppose that at some time t we are given the data vector z(K) and 

K 

the previously applied control u(K-l). The probability density conditioned 
upon z(K) is 


p k (y t j 


z(N-K) 1 z(K)) = 


p T (y T > z ) 

> zK (z(k)) " 


(3.23) 


where 


P zK(z(K) ) 


oo OO 



— oo 


2) dy T dz(N-K) 


(3.24) 


and z(N-K) is the data to be obtained after t . Let the superscript ~ 
denote the maximum likelihood estimate. Define the likelihood function 

L(K) = In [p K (y T , z (N-K) | z (K) ) ] = In p T (y T> z) - In p zK (z(K)) (3.25) 

Then y T (K) , the anticipated data z(N-K), and the present plus anticipated 
control u(N-K) are obtained as the values which simultaneously satisfy 


0 = 


0 = 


|~3L(K) 1 

p(ln P T )"| 

L9 y T (K)J" 

L 9 y x (K) J 

1" 9L(K) 1 

pa (In P T )-| 

|_3z(N-K)J 

'[ 8z (N-K) j 


(3.26) 


(3.27) 


where u(N-K) = [u^., u^^, • • • = u(z(N-K)) (The P^ K (z(K)) does not 

enter into (3.26) or (3.27) because z(K) is given.) The instantaneous 

u(K) is then the first element of u(N-K). This algorithm is supposed to 

be applied at all times t where data is introduced. 

K 

Note that there is another maximum likelihood estimation procedure 
which can be devised. Suppose that the subset of components of y T which 

~ T T T 

appear in the performance index P are denoted by y , where y = [y , y ] . 

1 1 2 
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Define the marginal distribution 


P K ^ y T, 


z(N-K) | z (K) ) 



P K (y T , z(N-K)|z(K))dy T2 


(3.28) 


Then equations similar to (3.26) and (3.27) are developed but with p* 

replacing p and y replacing y . In this case the uninteresting components 
K T 

y are integrated out prior to finding the estimates and the control, and 

we are controlling the ’’marginal mode” of the trajectory. If the entire 

y vector is estimated and the components y are selected, which occurs 
I i ^ 

when p is used, we are controlling the ’’modal trajectory”. It can be 
shown that these two techniques yield identical results when the system 
is linear, otherwise they do not. Control of the marginal mode has the 
most esthetic appeal, but for computational convenience we shall work with 
the modal trajectory. Essentially, we wish to avoid calculating a proba- 
bility density function in real time. 

3.4 ON THE SEPARABILITY OF ESTIMATION AND CONTROL 


The combined estimation and control algorithm developed in Section 3.3 
might not be practical to apply. Analogous to the open loop control problem 
discussed in Section 3.2, where the solution can be divorced from statistical 
considerations, we seek to introduce assumptions which allow the estimation 
and control problems to be separated. That is, we wish to justify a pro- 
cedure whereby the initial condition y can be estimated at t , given data 

up to t , and this estimate can be treated as though it were the true state 
K 

of a deterministic system for the purpose of computing the control. 


Suppose that dt in (3.10) is sufficiently small so that the state 

transition matrix -] can be represented by its continuous analogue. 

3y o 

Then, as in Section 3.2, 

r 1 


it 


= exp 


-J 


trace [~ (x(t; y^; u)>t]dt 


(3.29) 


9 f 9 2 

For many applications it is true that trace [— ] = 0, and hence = 1. 


For example, this condition applies when the continuous form of (3.10) is 
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a Hamiltonian system. Then let us assume that 


(A) 


3(y o ,n) 


ia_ 

3(y T »z) 


3y 
J o 


-1 


= 1 


(3.30) 


(B) the apriori probability density function P Q (y Q , n) 
is of the form 


p (y , n) = p (y ) p (n) 
r o J o 9 J o r n 


(3.31) 


where y and n are both Gaussian variables with zero mean. 
; o 

(C) the n_^ are uncorrelated, i.e. 

T 

E[n n ] = T = diagonal 


Then, using (3.15), 


- In p T (y T ,z) 


1 ( T .-1 
' 2\ y o A 


y + (z-H> r 
o 


1 (z-H)j 


(3.32) 


where A and T are, respectively, the apriori variances of y^, and n. The 
y^ and H(y^, u) are considered to be implicit functions of y^ and z. 
Because of the form of (3.32), the likelihood equations (3.26) and (3.27) 
become 


/ T a " 1 

= K A - 


[ z-H ] [r‘ 


A . / 3H \ 

/9u(N-K)\l 

)P y o" 


J yau(N-K) J 

V 9 ^o /] 

(KJ 


\ , ( 3H. \ 

/ 9u(N-K)\7 

If 3y o 1 

J + hu(N-K) 1 

l )\ 

' 9 z (N-K) 


T -1 

+ [z-h 1 ] [r x ] 


r. .i 

|l(N-K) J 


(3.33) 


(3.34) 


- 32 - 



where I (N-K) is the (N-K) dimensional identity matrix. But 


8h k 

au(N-K) 


(3.35) 


Let y be a solution of 
; o 


0 = 



[z(K) - HjJ 1 



(3.36) 


Then, because of (3.35) and assumption (C), it follows that (3.33) and 
(3.34) are satisfied by this value of y^ along with 

z (N-K) = H n _ k (y Q , u(K) , G(N-K)) (3.37) 

where u(N-K) the anticipated control, which depends upon the control law 
u(z), and u(K) is the previously determined (fixed) control. 

Since we have as yet made no assumptions about the apriori specified 

function u(z) , it is now legitimate to define the anticipated control 

u(N-K) as a function of y T (K), where y T (K) is the solution of (3.10) with 

y (K) used to define the initial conditions x and the disturbances dw. . 

J o o 1 

The u(N-K) might be determined, say, by minimizing some performance index 
J(y T (K), u (N-K) ) . Then 


9J 

9u(N-K) 


0 


(3.38) 


defines the control. For example, (3.38) would be the Euler-Lagrange 
equation if the control were continuous and the system were deterministic. 
This u(N-K) has the property that at each time t some function of the anti- 
cipated maximum likelihood estimate of the final state is minimized. Since 
u^ at every time t^. is in this way a function of y^QC), which in turn is a 
function of z(K), it follows that u(z) is a well defined function. Then 
the combined estimation and control procedure is as follows: 

(a) find y Q from (3.36) 

(b) using this value in (3.10) and (3.38), find the anticipated 

control u(N-K) and hence u(t ) as the first element of u(t ) 

K K 
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Equation (3.37) can be used to determine z(N-K), but this is not re- 
quired to define u(N-K) . Note that z(N-K) is the data which would be real- 
ized if y did describe the true values of x and the past and future dis- 
o o 

turbances dw_. , and if the future data noise were to be zero. Thus z(N-K) 
is the data calculated at times t R < t_^ ^ t^ on the modal trajectory defined 
by y Q (K) and u(N-K). 

The estimation and control equations developed here might be simpler 
to implement in continuous, rather than discrete, form. As pointed out 
above, the continuous form of (3.38) is the Euler-Lagrange equation. The 
continuous form of the estimation equation (3.36) including the effects of 
continuous state noise, will be developed in Section 5. 

3.5 THE MOMENTS OF THE CONDITIONAL ESTIMATION ERROR 

For error analysis purposes it may be necessary to compute at time t^ 
certain statistics of the estimation error = y - y (K) associated with 
the estimate y^. To avoid the necessity of deriving a complicated error 
probability density function, an approximate expression for the moments of 
e can be developed, which becomes exact under certain asymptotic conditions. 
For convenience we will henceforth drop the subscripts K and 0, and e will 
be understood to mean e = [y^ - y Q (K)|z(K)]. 

Suppose Ladt the estimation equation (3.36) is written 

o = A -1 V - A T (y) r _1 [H (y) + n - H(y)] (3.39) 

where H(y) is understood to mean H^(y, u(K-l)), with u(K-l) fixed at its 
predetermined values, and 


A (y) 



(3.40) 


Then (3.40) implicitly defines 
be expanded about an arbitrary point 


the random variable y (y,n), which can 
y = a, n = b in the form 
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y (yj>n) = y (a,b) + 


|f“ (a * b) 


(y-a) 


IxT 


(n-b) 


+ higher order terms 

Applying the implicit function theorem, (3.41) yields 


(3.41) 


ay 



“““ J 

■ 

L 

dn 



- 1 + A T (y) r 1 A(y) - B(y,n,y) 


■ 1 ,A. —— 1 


-1 


A T (y) r -1 A(y) 


A T (y) r 1 


(3.42) 


(3.43) 


where, assuming T is diagonal, B is the symmetric matrix with elements 


B. 




s r. (y,n,y) t 


.-1 

kk 


3 H k (y) 

by. . 

L * j 


and r is the residual vector 


r(y,n,y) = H(y) + n - H(y) 


(3.44) 


(3.45) 


The higher order terms can be evaluated by repeated differentiation of 
(3.42) and (3.43). 

Suppose we are given a data vector z which yields the estimate y. 
Choose a = y and b = z - h(a) in (3.41), which implies that y(a,b) = a. 
This value of b is the best estimate of n, given z. Holding z fixed, so 
that y = a is fixed, the estimation error conditioned upon z is £ = y - y 
= y - a. Then from (3.41) we have 
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0 = 


sy_ 


’SF' (a,b) 


£ + 


-^-Ca,b) 


(n-b) 


(3.46) 


+ higher order terms. 


The series (3.46) implicitly defines e (n-b). Since e(o) = 0, we can apply 
(3.46) to obtain 


e - 




n-b 


<p _i 

a 1 r a 


+ higher order terms in (n-b) 
-1 


T -1 
A l T 


(n-b) + ... 


(3.47) 


The moment generating function for e is 


M 


OO 

,(©) = / exp (© e(n-b)] P n ( n ) d.n 


.-1 


/ T m r~ 

exp [6 e (n-b) - n - n] dn 

—co CL 


(3.48) 


where 0 is a parameter vector and c is the coefficient of the Gaussian 
density function p^. The p t ^ 1 moment of the i^ component of e is then 
given by 

(3.49) 

0 

Alternatively, one could evaluate the moments of e directly from (3.47), 
since n is a zero mean, Gaussian variable with known covariance. In either 
case it would be necessary in practice to delete some of the higher order 
terms in (3.47). This can be justified if terms of the form 


E [ £?] = 


^d P M (ej\ 


d©. 

1 


© = 
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! 



are negligibly small for large p, which is implied if the variance of e is 
small. 

Note that the first moment E[e] is the difference between the maximum 
likelihood and minimum variance estimates, which is defined by the property 
that E[e] = 0. Thus the minimum variance estimate is {a 4* E[e]}. If only 
the linear term in (3.47) is significant, as would be the case if the 
asymptotic conditions to be discussed in Section 3.6 apply, then both esti- 
mates are (approximately) the same, and both produce a Gaussian estimation 

T -1 

error with variance (AT A) . This expression will be developed in 
Section 3.6. 

3.6 THE ASYMPTOTIC MOMENTS OF THE CONDITIONAL ESTIMATION ERROR 

A simplified (truncated) form of (3.47) can be devised under certain 
,f asymptotic ,f conditions, which can apply if state noise is not present. 

Assume that the system (3.10) is observable and asymptotically well condi- 
tioned, where 

Definition; A dynamic system is said to be completely observable at 
time t R if the normal matrix (information matrix) has full rank for all 
y, where 

normal matrix = [A^(y) r 1 A R (y) J = N K (y) (3.50) 


A system is said to be asymptotically well conditioned if the eigenvalues 
of N (y) go to infinity as t__ goes to infinity. 

Definition: The asymptotic form of (3.47) is the series obtained by 

— 1 2 

deleting terms of order |n | 


The linear terms of the asymptotic form are as given by (3.42) and 
(3.43). The higher order terms can be evaluated by noting that 



(3.51) 
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Then if 



terms of order |n' 


it follows that (symbolically) 





(a,b) 



where 



involves terms of the type 



since 


Sr 

^7 


(a,b) 




Similarly, 


dy 

bn 




+ W - B 


-1 

A 



r 


-1 


so that (symbolically) 


5 y 

a 2 
3n 


7 (a,b) 


terms of order (Tif h 


0 


-,2 - 

9 y 

3nby 


(a,b) 



( 3 . 52 ) 


( 3 . 53 ) 


( 3 . 54 ) 


( 3 . 55 ) 


( 3 . 56 ) 
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Thus to second order (3.46) is approximately linear in (n-b) and quadratic 

i 3 ^ | 

in e. If the cubic terms in (3.52) are negligible, and if A i s Q f order 

-1 ' 3y I 

N , then only the linear terms of (3.47) are significant y and 

e = N -1 A T r _1 (n-b) (3.57) 

The estimation error is then (approximately) a Gaussian variable, with 

E[e] = - N _1 A T r " 1 b = - N _1 A -1 a (3.58) 

T -1 -1 

since AT b = A a. This is the correction term which makes the estimate 
for a linear system unbiased over the ensemble of all data realizations 
(Recall that e is the error obtained holding the data fixed). The variance 
is 


-1 -1 2 -1 T -1 T -1 -1 

E[e - N A a] =N A T A N =N (3.59) 

Assuming the linear approximation is reasonable, the asymptotic mean of e 
could be obtained without inverting the series (3.46) by taking the expecta- 
tion of (3.46) and substituting E[e^] as obtained from the Gaussian approxi- 
mation for all p > 1. 

3.7 THE ASYMPTOTIC STATISTICS OF THE ESTIMATION ERROR FOR PARTIALLY 
OBSERVABLE SYSTEMS 

An asymptotic form of the error statistics can be developed for some 
components of the state of a partially observable system, where the normal 
matrix does not have full rank. This situation is to be expected when 
state noise is present. 

Suppose the state vector is decomposed into two parts according to 
y T = [y^, y^l , and let (3.42) and (3.43) be written 
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3n 







*2 = 



W 11 W 12 


< A I 

r -1 a x ) 

/-s 

-3 

H 
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ro 

1 

i 

H 

ro 

t 

II 

<4 

r 1 a 2 ) 

(Ag r 1 a 2 ) 



M 11 M 12 


^ A ll + N 11 

‘ B ll^ 

(^12 " B 12^ 

M 

1 

OJ 

aF 

rH 

a? 

1 


( N 21 ' B 21^ 

Ca 2 2 + 

N 22 " B 22^ 


(3.60) 

(3.61) 

(3.62) 

(3.63) 

(3.64) 


(3.65) 
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In (3.65) we have assumed that and are apriori uncoupled, so that 
A ^2 = 0 and ~ 0* From (3.60) and 3.61) we have 



M ii H n 


+ K (1C?- M 


'21 ”11 W 11 " N 21 ^ 


(3.66) 



M U "22 * K < M 21 M li 


N 


12 


- n 22 ) 



A 1 1-1 + K («21 




r 


.-l 


where 



(3.67) 


(3.68) 


(3.69) 


Suppose that there is a weak coupling between y^ and y^ s in the sense 
that N ^2 ~ 0, B 12 ~ 0, and hence ~ 0. Then 



(3.70) 


(3.71) 


(3.72) 


In particular, this is the case if y ^ is the unobservable portion of 
the state vector, for then A 2 = 0. Thus if asymptotically goes to 

infinity while ^2 remains small, we assert that the analysis of Section 3.4 
can be applied essentially unchanged to the observable components of the 
state. This conclusion is important when there are many system parameters 
to be considered, e.g., when state noise is present. 
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3.8 APPROXIMATE DIFFERENTIAL EQUATIONS FOR THE ESTIMATION ERROR BIAS 

In Section 3.6 it was shown that the conditional estimation error 
becomes asymptotically Gaussian, and it was suggested that the error bias 
(mean) in the non-asymptotic case (finite normal matrix) might be approxi- 
mated by using the Gaussian assumption to calculate the higher moments in 
the (3.46). Consistent with this approach, it is possible to derive an 
approximate differential equation for the estimation error bias. The 
method can only be rigorously justified for observable systems with no 
state noise, but the effect of state noise will be included for completeness. 

A A 

Let x and dw(t) be the estimates of the initial conditions, deter- 
o 

mined from data z(K), where t £ t . Define 

Jx 

e(t) = x ( t ) - x(t) 
d[6w(t)] = d[w(t) - w(t)] 

Expanding the right hand side of (3.1) about [x(t) , dw(t)], and (for sim- 

plicity) deleting the control, we have 


de . 

— = y 

dt 4" 

3 


if. 


3x 


1 ^ 

~ (x,t) 


:j(t) + 2 £ 


3 2 f. 

r 

3x . 


(x, t) 


(t) e fc (t) 


+ higher order terms + — [6w_^(t)] i,j,h=l,..n 


(3.73) 


Let E[e^(t)e_. (t) ] = (t) . Then introducing the Gaussian assumption for 

the purpose of calculating moments on the right hand side of the differen- 
tial equation (e.g., first moment is zero, etc.), we have 


de , 


dt 


= 2 S \ 3 V*J( p J k ) 

+ 1_ y ( V , 

4! , \ 9x.9x. Sx n 8x / \ P jk ^lm/ 

j ,h,lm ' j k. 1 m / 1 


+ terms of order (p., ) 

J R 


i,j • -n 


(3.74) 
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I 



+ terms of order (p . . ) + y . . 


(3.75) 


where 


Y i j(t) dt = E[{d6w^(t)} (d6w^(t)}] 

Consistent with the Gaussian assumption, the initial conditions for (3.74) 
and(3. 75)are, respectively, Efe^O)] = 0 and E[p_^(0)] = where is that 

portion of the normal matrix of (3 . 50) which describes the initial conditions. 
Similarly, the y^.(t) would be calculated as though the system were linear. 
This should be done by finding the error covariance associated with the 

d^ 

continuous estimator of (— ) , to be discussed in Section 6. Alternatively, 
one could find an approximate y _ (t) from the well known Kalman filter 
equations . 

Note that this method of finding the estimation error statistics 
differs from other approximation techniques employing Gaussian moments in 
that the modal trajectory x(t) is used as the reference, rather than the 
apriori trajectory. Recalling the discussion of the asymptotic properties 
of the estimation error of Section 3.6, it appears that moment truncation 
can only be justified if one expands about the modal trajectory. This will 
be the basis for the linearized analysis to be presented in Sections 7 and 8. 
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4. ESTIMATION OF UNKNOWN ACCELERATION 


4 . 1 INTRODUCTION 

The maximum likelihood estimation algorithm developed in Section 3 
requires that the initial conditions and the random acceleration function 
be simultaneously estimated. The resulting solution includes the "smoothed" 
(rather than "filtered") estimate of the acceleration function at every 
instant in time, based upon the entire data record. This property, plus 
the fact that linearity is not assumed, distinguishes this approach from 
the treatment of state noise in Kalman filtering. The Kalman form of the 
estimator can be extended to treat the linear data smoothing problem, how- 
ever, as is shown in [32]. 

This type of problem arises in many applications, such as the recon- 
struction of lift/drag histories for re-entry vehicles, or the analysis of 
trajectories perturbed by random thrust acceleration, or the determination 
of orbits which are perturbed by model errors of unknown origin. Probably 
the most important application is the latter. In this case one assumes 
that an unknown acceleration is acting, pretends that it is white noise 
with some hypothetical variance, and recovers an estimate of the unknown 
function which is hopefully a reasonable approximation of physical reality. 

In this Section we shall develop a continuous expression for a 
more general form of the estimation algorithm presented in Section (3.5), 
by using a variational approach (see also [33]). A practical numerical 
algorithm for solving the problem will be developed, as well as a successive 
approximation technique which may be adequate for some applications. A 
simple example will be discussed, and the associated estimation using the 
algorithm to estimate non-white acceleration will be analyzed. 

4.2 FORMULATION OF THE PROBLEM 

Let the equations of motion be 

dx = f(x, t ) dt + G(x, t) dw (4.1) 

where x is the n-dimensional the state vector, composed of the position and 
velocity coordinates plus all parameters which affect the tracking problem 
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(f^ = 0 if x, is a constant parameter); the m-dimensional vector 
dw 

cd(t) = [“ — (t) ] is the random (unknown) acceleration function, and G(x, t) 
at 

is a matrix which depends upon the state and time. For example, if x is 

a six dimensional vector, where (x^, x^, x^) are velocity components and 

(x^, x,-, x^) are position components, then a possible form of G is [•?•], 

where I is the 3x3 identity matrix. In this case n = 6 and m = 3. Let 

th 

the differential of the i— data type (e.g., the incremental change of the 
doppler integral) be given by 

dpi = h^(x(t), t) dt + dn^Ct) i = 1, . . . k (4.2) 

where k is the number of data types, h^(x(t), t) is some nonlinear function 
of x and t, and dn_^(t) is data noise. Suppose that w^(t) and n^(t) are 
uncorrelated, zero mean, Gaussian stochastic processes, where, at each time 
t, 


E 




i = 1, . . . m 


E 




i = 1, . . . k 


Thus w^(t) and n^(t) are Wiene t processes. Furthermore, suppose that the 

apriori distribution of the initial state x is Gaussian, with mean p and 

o 

covariance matrix A. Considering the vectors dw and dn composed, respect- 
ively, of the dw(t.) and dn(t.) at the discrete times {t , t, , . . . t . , . . . t._} , 
y l i o 1 i N 

the joint apriori probability density function (p.d.f.) for (x^, dw, dn} 
is of the form 


[p.d.f. (x q , dw, dn) ] = [p.d.f. (x q ) ] • [ p.d. f . (dw(t Q ) ) ] • • • 

• • • [p.d.f . (dw(t N ) ]• [p.d.f . (dn(t Q )) ] • • • [p.d. f . (dn(t N >) ] (4.3) 


where each [p.d.f.] is Gaussian. Let z^ = 0^" ) » Taking the negative of 

the logarithm of (4.3), substituting [z_^ - h^x, t) ] dt for dn_^(t), and 
passing to the limit as dt ->■ 0, the likelihood function for (x q , h, a} is 
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p 558 




A 1 (x 

o 


- Vi) 


+ 




- h ± (x, 


:) 


dt 



a^(t)dt 


(4.4) 


where t 0 and T is the duration of tracking. Given data z(t) P the problem is 

to choose estimates of the initial condition, x , and acceleration function, 

o’ 

a(t), which maximizes the likelihood function p. 

4.3 THE FORM OF THE ESTIMATION EQUATIONS 

The estimation equations can be obtained as the solution of a calculus 
of variations problem of the Mayer type. Define the additional state vector 
component p(t), with 


p(o) = “ (x - y) T A 1 (x - y) 
l o o 


(4.5) 


and 


dp(t) _ 1 
dt 2 


k 

E 

i=l 


-1 


n , 


(t) 


- h ± (x. 


t) 



(4.6) 


Thus p(T) is the likelihood function, as per (4.4). Adjoin the p to (4.1) 
and construct the variational equation describing variations from the best 
estimate of the trajectory (denoted by "), given by 


dt 


5x" 

_6p 


F 

H 


OlpSxl [G 
+ T 

°JM [8 J 


6a 


where 6x(t) = x(t) - x(t) , 6a(t) = a(t) - a(t), and 


(4.7) 
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g T (t) = 


a (t) 
W 1 


-1 


> •••» 


a 2 (t) 
w 

m 


,-l 


a (t) 
m 1 


(4.8) 


F(t) = 


9f (x(t) ) 
9x 


+ [a(t)] 


3G (x(t), t) 
9x 


(4.9) 


H(t) = - £ 


i=l 


a (t) 


n 


-1 


9h i (x(t), t) 


9x(t) 


[z ± - h 1 (x(t) , t)] 


(4.10) 


G(t) = G(x(t), t) 


(4.11) 


Introduce the state transition matrix associated with the variational equation 
(4.7), which is of the form 



t > s 


where U(t, s) is the familiar matrix 
has the property that, for fixed t, 


3x(t) 

3x(s) 


and the row vector 


T 

x (t. 


s) 


X T (t, s) = - U T (t, s) F (s) + H(s) ] (4.12) 

Note that A(T, s) is the Lagrange multiplier vector of [33]. The solution 
of (4.7) can then be written in the form 


L. 

$x(t) = U(t, o) 6x 4- f U(t, s) G(s) a(s) 

o J 


ds 


(4.13) 


6p(t) = 


X(t, o) + 1 6 x q + j [x T (T, s) G(s) + a(s) T R 1 (s) 

o J o 


6a(s)ds 

(4.14) 


- 47 - 


where 


R(s) 



0 


0 

% 2(s) 

m 


(4.15) 


If p(T) is to be maximized with respect to arbitrary variations in x q and 
a(t) , it follows that 6p(T) must be zero for all variations 6 x q and 6a(t), 
for if this were not true we could improve the value of p(T) by a first- 
order change in these quantities. Thus, (taking the transpose) 


A(T, o) + A -1 (i Q - n) = 0 


(4.16) 


R(s)G (s)A(T, s) + a(s) = 0 


(4.17) 


where 


3 P(o) 


3x 


was obtained from (4.5). The solution for A(T, s) can be 


explicitly obtained in terms of U(s, o) , for it can be shown that 



o) 


F(s)U(s,o) 


(4.18) 


Then 


h iu ' 1(s ’ 0)1 - 


- [U 1 (S, o)] F(s) 


(4.19) 


and it is easily verified that 


A (T, s) = 


J* H(x) U(t, o) dt 


U 1 (s, o) 


( 4 . 20 ) 


satisfies (4.12). Define the familiar row matrix 


A ± (t) = 


3h ± (x(t) , t) ' 

T 

u(t, o) 


" 9h ± (t) 

3x 



3x 

L o J 


(4.21) 
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Then, from (4.10), (4.16), (4.17) and (4.20), we have 


0 = A 1 (x - y) 
o 


T k 


- / s W: 

o x=l L 1 


(t) 


-1 


T 

A.ht) 


z ± (t) 


h ± (x(t), t) dt 


(4.22) 


0 = a(s) - R(s) 


[U 1 (s, o) G(s) ] T f 


k 

E 

i=l 


n . 


(t) 


-1 


w 


(t) 


z ± (t) 


- h i (x(t)t) 


dt 


(4.23) 


Equations (4.22) and (4.23) are the results we seek. The solution of the 
estimation problem is the x q and a(s) which satisfies these equations. 


4.4 A NUMERICAL ALGORITHM FOR SOLVING THE PROBLEM 

Suppose that observations are made at discrete times {t^}, where the 
measured observation vector is, according to 4.2, dp(t^) = p(t^) ~ p(t± p. 

The computed (predicted) observation at t^, given the estimate of the state 

at t. , is [h(x(t. -), t. n ) ] dt . Assume the data has been normalized so 
l-l i-l 9 l-l 

2 

that a (t) = 1 for all i = 1, 2, . . . k, and define the data residual 
n . 
l 


r . 

l 


r(t.) 


[h(x(t i _ 1 ), t ± _ 1 )]dt - dp (t ± ) 


(4.24) 


3h 

If p(t) is a k dimensional vector (usually k = 1) then [— — jr-r*] is a k by n 
matrix (usually n = 6) and the matrix 




(4.25) 


is also a k by n matrix, where U(t) denotes the state transition matrix 
U(t, o) . Suppose that the apriori variance of x q is infinite (A ^ = 0) , 
so that the discrete form of (4.22) is 
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(4.26) 


£ A j r j - 0 = D A* r + £ A* r 

j=l 33 j=l 3 J j=i+l 3 3 

Setting U = V, and employing (4.26), the discrete form of (4.23) is 


a(t) = [R(t 


.) G T (t.) V T (t.)] £ A T r. 
i i l J 3 


t. £ t z t Bj - (4,27) 
i i+l 


If we let 


3. = £ A T r. 

1 j 3 3 


(4.28) 


then (for dt sufficiently small) the equations of motion are 


x = f(x, t) + G(x, t) a (t) = f + (GRG T V T )8 = f + <p& 


(4.29) 


Each time a data point is passed, 6 is changed simply by 3 + A r 3 . If 
6 = 0 after the last data point is processed, then the problem is finished. 
If not, then x^ must be changed. To see how much x q must be changed, define 


N(t) = 


3S(t) 


dx 


so that a variation 6 x q results in a variation 66 given by 


66 = N6x . We also define M = M(t) so that 6x(t) = U(t) M(t) 6x . Now 
o o 

6x satisfies the differential equation 


(fix) = F6x + <f >56 


(4.30) 


If M satisfies the differential equation where F = F(t) is defined by (4.9) 


M = V (J) N , 


(4.31) 


then 6x = U M 6x will satisfy the required differential equation as may be 


seen by substitution: 
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(6x) = dt" (U M 6x o )’ 


" U M fix + U M 6x 
o o 


= F U M fix 4- U V (j) N 6x 
o o 


= F <Sx + d) N 6x 


= F 6x + <f> 5 6 


(4.32) 


At each data time, the variation in r can be computed 


5r = 


3z 


3x. 


6x. 

l 


3z 


3x. 

l 


U (t . ) M( t . ) 6x 
l i o 


= A M fix 


(4.33) 


Now the change in the variation of B at each data time can be computed: 


B + A r + B 


6 B + A 5 r <5 B 


N fix + A A M fix N fix 
o o o 


N + A A M + N . 

Finally, if S 0 after the last data time then a variation of 6x^ = 
[N ( t) ] 1 B(T) should make it near zero. 

In summary, we have established the following algorithm: 

Initial conditions for integration 

x(t) = x q = approximate initial state vector (6x1 vector) 
M(0) = U(0) = I (6x6 matrices) 
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3 (0) =0 (6x1 vector) 


N (0) =0 (6x6 matrix) 

Equations to be integrated 

k = f(x, t) + (|) 3 , where <j> = (G R G T V T ) , V = U _1 

U = FU 

M = V <p N 

Data processing (performed each data time) 
r = [h(x) ]dt - dp 

3 + A T r 3 

N + A T A M -> N 

Initial state correction (performed after all data has been processed) 
<5x o = - [N(t)] _1 3(T) 

If 6x =0, the problem is finished; otherwise x + 6x -+ x , and the 

integration is restarted. After convergence, the acceleration function is 

given by a(t) = [R(t) G T (t), V T (t)] B(t). 

Note that the inverse of the state transition U(t) can be simply 
obtained. The standard variational equations are 

U = FU, U(t ) = I, U(t) =6x6 matrix 

where F is the matrix of partial derivatives of components of f with respect 

to components of q. The inverse of U is called the adjoint matrix : V = . 

T T 

This matrix can be computed by V = J U J , where 



This is known as the n Jacobi inverse" of U. To prove that this is the 
inverse of U note first that F has the form 
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0 


I 


F = 

x? 0 

where V T = It follows 

that <j> = VU + VU = JU T J T 

T T T T 
JU (F J + J F)U = 0. 

and V = U _1 . 

4.5 A SUCCESSIVE APPROXIMATION TECHNIQUE FOR SOLVING THE PROBLEM 

Existing orbit determination computer programs solve the nonlinear 
estimation problem when no random acceleration is present by a modified 
Newton-Raphson method. That is, equation (4.22) is linearized about the 
K— estimate of x q , the (K + 1)— estimate is obtained by solving a set of 
linear equations, and the procedure is iterated to convergence. This pro- 
cedure will obviously give an approximately correct estimate of the para- 
meter vector x^ if only small amplitude random acceleration is present. 

It, therefore, seems reasonable to suppose that the presently employed 
parameter estimation algorithm could be modified so as to reflect the 
presence of small applitude acceleration. Such a method will be developed 
here. 

2 

Assume the data has been normalized so that a (t) = 1 for all i, 

n . 

l 

and let 

- K th . 

x = the K — estimate of x 
o o 


that F T 4- J T F = 0. Letting <J> = VU, it follows 

mm. TTT TT 

U + JU J U = JU F J U + JU J FU = 

Since <f>(t ) = I and <J> = 0, it follows that <j> = I 


-K th 

a (t) = the K — estimate of a(t) 


Holding a (t) fixed, expand (4.22) to first order in x^ to obtain 
[x o K+1 - x q K ] = [A 1 + N K ] 1 | f [A T (t) ] K [z(t) - h(x K (t), t) ]dt 


- [A 1 (i K - V>) ] 
o o 


(4.34) 
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K K K 

where [A(t>] , [U(t, o) ] , and [G( t) ] are the previously defined matrices, 

til 

evaluated on the K — estimated trajectory, and the normal matrix is 



k T 

E / [A?(t) A.(t)] K dt 

i=l 0 


(4.35) 


The revised estimate of a(t) is 

T 

[S(s)] K+1 = [R(s) G T (s) V T (s)] K / [A T (t)] K [z(t) -h(x K+1 (t), t)]dt (4.36) 

s 


This algorithm is to be iterated to convergence, where the partial derivative 
matrices could be changed at each iteration in order to introduce the effect 
of nonlinearity. Note that equation (4.34) is identical to the form presently 
implemented in orbit determination programs, except that the calculated 
residuals reflect the effect of a(s) and successive iterations introduce 
changes in a(t) as an intermediate step. 


It is not at all clear that the suggested successive approximation 

technique will converge. The solution of the linearized problem can be 

thought of as a successive approximation technique for inverting a matrix. 

To see this, suppose that there is only one data type which is linear in 

x and a(t), that is 
o 


z(t) = A(t) x + 




L 

J* U(t, s) G(s) a(s) ds + 


dn 

dt 


(4.37) 


2 2 

Suppose that p = 0 and . (t) = for all t, and let equations (4.22) and 

(4.23) be written in the symbolic form (think of a(t) as the infinite- 
dimensional vector a) 


[A 1 + N] x + Pa = f- (z) 
o I 


O ry ty 

[I + a M] a + a Qx = a f„(z) 
v v x o v 2 


(4.38) 

(4.39) 
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where P, M and Q are linear integral operators denoting, respectively, 
the operations 


f [A T (t)][|^ (t)] J U(t, s) G(s) a(s) ds dt 


T t 

[U _1 (s ,o) G(s)] T / [A T (t)][|| (t)] f U(t, t) G(t) o(t) dx dt 


[U 1 (s,o) G(s)] T / [A T (t) A(t) ] dt 


and 


f^(z) = J A z dt 
o 


f (z) = [U (s , o) G(s) ] J A z dt 


Using equation (4.39) to eliminate a from equation (4.38), we have 

(A -1 +N-o 2 P[I+a 2 M] -1 Q) x = (f 1 (z) - a 2 P[I + a 2 M] -1 f„(z)) 
o v v o 1 v v 2 


(4.40) 


2 

If is sufficiently small, an approximate solution for x q is 

x o = [A _1 + N] _1 f 1 (z) 
o o 1 

and the corresponding approximation for a is 


(4.41) 


a = a v [f 2 (z) - Q x q ] 


(4.42) 


This procedure can be iterated, yielding 

[i o ] ±+1 - [i^] 1 = [A o _1 + N] -1 {f 1 (z) - [A o _1 + N][x Q ] i - PtS] 1 } 


(4.43) 
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(4.44) 


[a] 1+1 = G v 2 { f 2 (z) - M [a] 1 - Q[x] 1+1 } 

It can be shown that this process converges to the correct answer if 

is sufficiently small. Recalling the meaning of the operators P, M, and Q, 

equations (4. 43) and (4.44) are identical to equations (4.22) and (4.23) for 

the linear case. Thus it appears that the successive approximation technique 

2 

will converge for linear systems if is sufficiently small. 

4.6 ON THE ERROR STATISTICS ASSOCIATED WITH ESTIMATING UNKNOWN, 

. NON-WHITE ACCELERATION 

The estimation equations for determining x and a(t) depend upon the 
2 ° 

parameters a , which is the apriori variance of the white data noise, and 
2 n 

o ^ , which is the apriori variance of the white random acceleration. In 
practice, one might apply such an algorithm to simplify the task of esti- 
mating an unmodeled acceleration which is not white; that is, the postulated 

model is to be thought of as an approximation to physical reality. Given 
2 

a , a sub-optimal estimator is then obtained which depends upon the para- 
n 2 

meter a . Given the true error statistics of the unmodeled acceleration, 
w * 

the analyst must then determine the resulting estimation error statistics 

2 2 

as a function of a and choose a so as to achieve an acceptable result, 
w w 

In this subsection such an analysis will be illustrated by treating a sim- 
plified trajectory model which describes the motion over a short tracking 
arc. Considering the inevitable uncertainties in the apriori statistics, 

it is probably true that the simplified analysis will be adequate for deter- 

2 

mining a rule for selecting the parameter. 

4.6.1 The Simplified Problem 

Suppose the one component of the velocity variation from a perturbed 
trajectory over a short arc can be described by 

v = a(t) (4.45) 

where v is the velocity variation and a(t) is the unknown acceleration . i 


fThis model assumes that the gravitational acceleration can be represented 
as a function of time, so that gravity variations are zero in equation (4.45). 
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2 


and 


We postulate that a(t) is white acceleration noise with variance o , 
that the apriori variance of the initial condition v^ is infinite. The data 
consists of 


z(t) = v(t) + n(t) = V 0 + y t a(s)ds + n(t) 


(4.46) 


where n(t) is white data noise. Let 


B(t) = J a(s) ds 


(4.47) 


Then, applying Section 4.3, the maximum likelihood estimates (denoted 
by *) of v q and 3(t) are found from 


where 


i r 1 

v 0 *m = f / 


z(t) - g (t) 


dt 


. * 


6 (t) = -X' 


/ 


"k 

z(s) -B (s) -v 


ds 


(4.48) 


(4.49) 


X = 



If a(t) is indeed white acceleration noise the value of X is determined; 
otherwise, X is a parameter to be chosen. In this section the dependence of 
the estimation error statistics upon X will be examined. 

4.6.2 The Estimation Error Statistics 

Let 


£ 


V 



(4.50) 


e 6 (t) = S*(t)-B(t) 


(4.51) 


- 57 - 



then, substituting equation (4.46), 


e 

v 



e g (t) 


dt 



n(s) - e 6 (s) -e y 


ds - a 


and 


e 


6 



n 


a 



Since 


SgCO) = 6* CO) - 6(0) s 0 
Eg CO) = — ce (0) 


the solution of this equation is 


£g(t) = - ( j sinh At 


+ 



= A 


J sinh A(t-s)|e v 
o 



ds 



t 

cosh A(t-s) a (s) ds 


o 


(4.52) 

(4.53) 

(4.54) 

(4.55) 

(4.56) 


(4.57) 
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where we have integrated by parts to eliminate the a(s) and a(o) terms, 

2 

Assuming for the moment that E[^ v ] is negligibly small, the integrated 
acceleration (3) estimation error is described by the autocorrelation 
function 


r(t,x) = E|e g (t) e g (x) 


2 2 f T 

= A a n / sinh A(t-s) sinh A(x-s)ds 


*/7 cosh A(t-s 1 )cosh A (t-s 2 ) R(s ^ ds ^ds^ 


o o 


(4.58) 


where t > t and 


E jn(s 1 )n(s 2 )J 
E |a(s 1 )a(s 2 ) 



6 (s 1 -s 2 ) 


R(s 1 , s 2 ) 


(4.59) 


(4.60) 


2 

Note that E[e ( t ) ] = r(t,t). The initial condition estimation error is 
p 

described by (see equation (4.52): 


a 


2 


v 


E 


L 21 _ CT n 

2 sinh AT 1 ] 

l v | T 

l AT 1 \ 


T T 

+ ^2 J I r ^ S 2 ,S l' )ds l dS 2 ’ 

o o 


(4.61) 
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since, from equation (4.57) 


E n(s 1 )e g (s 2 ) = -Ac^ 2 J 2 sinh A(s 2 - s) 6(s 1 -s)ds 


(Aa^ ) sinh X(s^s^) s 2 >_Sj 


s > s 
1 b 2 


and hence 


IT o Z 

r ) f f E [ n( ' S l )e 6 (S 2 :i ] ds l ds 2 = 2 (^2") ( 
/ 0 0 


sinh AT - AT 


We also have 


J J E[n(s 1 )n(s 2 )] dSjds 


2 = 0 n 2T 


Equations (4.58) and (4.61) are the results we seek. 

4.6.3 Selection of the Tracking Time and Estimation Parameter 

The values of T and A can be chosen so as to minimize a bound on the 
2 

error variance a . Since a(t) is not white noise there is an M such that 


RCs 1 , s 2 ) £ M 
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From equation (4.48) we have 


,2 2 

A a 

n 


r(t,-r) <_ — y~ J t cosh Ut 


+ T 


2s)-cosh A(t-T)] ds 


+ M J cosh A(t-s)ds J X cosh A(x-s)ds 
o o 


A CT n 2 


[cosh At sinh At - At cosh A(t - t)] 


M 

+ — sinh At sinh At 


(4.66) 


Then the last term on the right hand side of equation (4.61) is bounded 
according to 



T T 

// r(t,x)dt dx <_ 
0 0 



(cosh AT - 1) (sinh AT - AT) 


M 2 

+ (cosh AT - 1) 

a 4 t 


and a bound on has been established. Suppose that we assume that AT 
is sufficiently large so that the (sinh AT cosh AT) terms dominate equation 
(4.61), and define 


a ^ (bound) = — y- [cosh k (sinh k - k) + 3 sinh k - k] (4.67) 

v 2k 


M • 2 

-t ;r— — (cosh k - 1) 

X k 


- 61 - 


where k = AT, Minimizing equation (4,67) with respect to A and T yields 


A 



(4.68) 


k = ATwl 


(4.69) 


Thus we conclude that A should be chosen according to equation (4.68) and 

2 

the optimal tracking time is T = 1/A. The corresponding bound on a ^ is 
then 


2 2 
a < 1.48 A a 
v n 


(4.70) 


Note that we get the intuitively obvious result that A -*■ 00 and T 0 as 

2 2 
M or as ->■ 0. Conversely, A 0 and T -* 00 as M -> 0 or as 00 . 

The latter case is usually assumed in orbit determination; that is, it is 

postulated that random acceleration is negligible and the entire random 

component of the data residual is attributed to data noise. 

4.6.4 An Application to Lunar Orbiter Tracking 

As an example of the application of these results, consider the case 
of tracking a spacecraft in orbit about the moon. Suppose the unmodeled 
gravitational effects produce a periodic acceleration of the form 


f The dimensions of equation (4.68) are compatible, for if M is in units of 
(ft 2 /sec^) and a 2 is in units of (ft 2 /sec), we have A in units of (1/sec). 
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a(t) 


a cos tot 


(4.71) 


where a 

a 


10 ^ m/sec 2 . 


The corresponding speed error would be 


- / 
o 


t 

a(s)ds 


a 

— sm ait 
to 


(4.72) 


and the maximum 3 is described by (a /oi) = 0.5 m/sec for 1/w = 5,000 sec 

a 

(approximately 1.4 hours). The acceleration autocorrelation function is 


R (t , t ) 



cos mt cos ojt 


(4.73) 


and M = 10 ® m 2 /sec^. Let the doppler data noise variance be a 2 = 60(10 2 ) 

2 n 
m /sec, corresponding to a one sigma "counted" doppler data error of 0.1 

m/sec for a 60 second count. Then 


4 (IQ- 8 ) 1/5 1 

L 60(10-2) j 245 


(4.74) 


T 


i 

X 


245 sec 


(4.75) 


a < 
v — 


.06 m/sec 


(4.76) 


Alternatively, if = 60(10 “*) , corresponding to a one sigma counted 
doppler error of 0.0032 m/sec for a 60 sec count, we have T = 24.5 sec. 

In this case, only a very short tracking interval is indicated, which implies 
that a sequential filtering technique should be employed with a dynamic 
model which includes state noise (a Kalman filter) . 
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5. PROPERTIES OF SEQUENTIAL ORBIT DETERMINATION ALGORITHMS 


5 . 1 INTRODUCTION 

It is shown in Reference [6] that the iterated, weighted least squares 
(maximum likelihood) , nonlinear orbit determination algorithm will converge 
under certain hypotheses. It is not possible, however, to give a similar 
proof for the particular sequence of computations known as the Kalman 
filter. In this method the information contained in early data points 
(or batches of data points) is stored in the form of an estimate and an 
error covariance matrix, and the revised estimate which incorporates the 
most recent data point (or batch of data points) is obtained as though the 
system were linear. In this case the neglected nonlinear effects could lead 
to divergence, which means that a given data point could cause the estimate 
to move away from the "true" value. A second source of divergence, which 
can also occur in the weighted least squares form of the estimate and in 
linear systems (where the weighted least squares and Kalman forms are 
theoretically identical), is numerical roundoff error. Both of these 
sources of divergence will be discussed in this Section, and a theorem 
which is the basis for sequential estimation will be presented. In 
Section 6 an analysis of the effect of nonlinearities in certain sequential 
estimators will be carried out in detail. 

5.2 THE LEAST SQUARES ALGORITHM 

We measure an N dimensional data vector z^ (that is, we take N 

observations) , and calculate for corresponding times the N dimensional 

vector z (x) which would be observed if the spacecraft were on the tra- 

jectory implied by the initial state vector x and the given perturbation 

model. We take the difference between the z and z , called the residual, 

M 

weight it according to the reciprocal of the standard deviation expected 
for that observation plus any cross correlations between observations. 

These are generalized in an N x N variance-covariance matrix W ^ . If the 
apriori variance of the initial conditions is infinite (A q ^ = 0) , then 
the orbit determination problem is to determine x = x^ so as to minimize 
the scalar expression 
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r(x) = (z c (x) - z m ) T W (z c (x) -z m ) 


(5.1) 


T 

If a decomposition of W can be found of the form W = R R, and if new 
variables g c (x) = R z c (x), = R are introduced, (5.1) becomes 


r (x) = (« c (x) - ' g M> T (§C (X) ' § m) ’ ° r 
= ||§C (X) ' g M|| 2 


(5.2) 


A program which minimizes (5.1) is called a minimum variance program, and 
a program which minimizes (5.2) is called a least squares program. It is 
sometimes said that the minimum variance formulation is more general than 
the least squares formulation and that to solve the minimum variance problem 
it is necessary to invert a very large matrix, W \ Neither statement is 
quite true, since, as seen above, any least squares program that permits 
the above sort of change of variable will solve the minimum variance pro- 
blem, and, in simplifying, it is only necessary to know the matrix R; it 
is not necessary to know W. The latter fact is important since R is gen- 
erally simpler to compute and, in practical cases, is a simpler matrix to 
handle than W. 

We consider that we want to minimize expression (5.2), which we 
rewrite as 


r(x) = ||h(x)|| 2 

where h(x) = g^(x) - g w . Let A be the matrix of partial derivatives of 
C M 

the components of (and hence of h) with respect to the components of x. 
The partial derivatives are evaluated at the point x^. Let x^ be an 
approximation. Then, expanding for a first order Taylor series about x q , 
we have 

r (x) = ||h(x o ) + A(x - x o )|| 2 (5.3) 
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Let $(v) be defined by 


$(v) = || h Q + Av || 2 


where v = x - X q and h Q = h(x Q ). Then (5.3) can be written r(x) = $(v). 
Hence, choosing x to minimize r(x) is approximately the same as choosing v 
to minimize $(v). 

The problem of minimizing $(v) is a linear, least squares curve fitt- 
ing problem with the solution 


v o = . (a T a) A T h o 

The differential correction process consists of finding the vector v q , 
letting x^ + v q be the next approximation, and repeating the process. 

The above abstraction is actually much oversimplified. In the 
practical problem, for example, $(v) is minimized subject to the side con- 
dition that the components of v should be within prescribed bounds. This 
prevents divergence in many nearly singular and/or nonlinear problems. As 
the iterations proceed, the bounds are permitted to grow or forced to shrink, 
depending on whether the iterations are successful. 

5.3 SIMPLE SEQUENTIAL PROCESSING 

In the following, a subscript of 1 indicates old data, and a sub- 
script of 2 indicates new data. The problem is to minimize $ (v) + 0 (v) , 

2 -L z 

where <K(v) = | + A^v| . Before the new data comes in, one can already 

find the solution 



With the new data the solution is 

v 2 = - (a, T A, + A 2 T A 2 ) ‘ (A! T h ol +A 2 T h o2 ) (5.4) 
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T 

Note that one can find by adding the "new normal matrix" to the 

"old normal matrix" A- T A- , and the new set of coefficients A n T h 0 to the 

1 1 T 1 ° l 

old set of coefficients A^ h Q ^. Hence it is, in general, not necessary 

to solve the whole problem anew each time new date comes in. 

Note that x + is the solution for the old data, and x + v 0 is 
o 1 9 o 2 

the solution with all data. Note that both corrections are intended to be 

added to the same nominal vector. Since the basic problem is nonlinear, 

after all the data has been processed, the final derived correction should 

be added to x^, and the process then repeated with all data collectively. 

The above process is then the ordinary differential correction process, 

except that one can^ point out intermediate results; one simply takes the 
T T 

matrix A A and the vector A h Q for the data "so far," and prints out the 
solution 

(a T a) A T h o 


for this data. Only when all the data has been processed does one actually 
use the solution v as a correction to x^; the intermediate results are only 
for information. 

One can prove the convergence of the general method described above 
under suitable hypotheses; but this is beyond the present scope. Two things 
are fairly simple however: (a) for a process to converge, it should have 

the property that if one starts out with the right answer, one stays there; 

V 

and (b) if the normal matrix 



-1 


is nonsingular, the method under discussion has property 
(b), one notes that the vector of partial derivatives of 
the components of x is just 


(a) . To prove 
r with respect to 


A T h 
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Hence at a solution 


A T h Q = 0, v =( aTa ) aT h 0 = o. 

5.4 SEQUENTIAL PROCESSING WITH SHIFTING NOMINAL TRAJECTORY 

In this method, one finds the solution for the old data, corrects 
the nominal trajectory immediately by x^ = x q + v^, and processes the new 

data with the new nominal. This results in a slight simplification since 

T T 

the h Q ^ term drops out of the right side of (5.4) 5 simply because A h = 0 

is a solution to the problem. 

More precisely, let a superscript of (0) mean evaluated at x^ , and a 
superscript of (1) mean evaluated at x^. Then the sequential processing 
method consists of evaluating 



As the processing continues, the nominal value of x changes each time new 
data is added. If the process is not iterated the algorithm is the Kalman 
filter. 

The claim is sometimes made that the above process avoids the need 
for iterations. A more accurate statement would be that it prevents one 
from iterating even though one should. The difficulty is that even if one 
starts out with the correct answer, one does not stay there, and hence the 
previously mentioned property of converging processes is not met. 
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To prove the last statement, let x 


be the true solution. Then 


( 0 ) 



Now excluding the exceptional case when x^^ is a solution to the old data 
alone and the new data alone: 



0 


Hence ^ 0 and x^ ^ x^. Expanding h^ in a Taylor series about x q gives 


(1) _ u (0) 


+ A 


( 0 ) 


V 1 + 


where in general the remainder term e is nonzero. 
Let 


E 




( 0 ) 

5 


F = 




(0) 
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satisfies 



Hence 




-F v _ 





This shows that if the problem is linear (that is, if E = F = e = 0) then 
= 0, and we would stay at the correct solution. If the problem is 
not linear, however, + v 2 ^ anc ^ we wou -^ not stay at the correct 



solution. Iterating on the data obviously will not correct this problem. 

5.5 RATE OF NONLINEAR DIVERGENCE OF THE KALMAN FILTER 

In this section we will investigate the extent to which the non-iter- 
ated sequential processing methods with shifting nominal (the Kalman filter) 
does not converge. Instead of attempting to prove convergence, we will 
outline a proof that under certain hypotheses the divergence is not too 
severe. 

2 

The basic problem is to choose x to minimize |h(x)| . A necessary 
condition for the minimization of h(x) is that 

A T h(x) = 0 


where the partial derivative matrix A is evaluated at x. 

In the Kalman filter method, suppose that the problem has been solved 
for all previous data, new data has arrived, and one wishes to solve the 
problem with the old and new data combined. A subscript of 1 will refer 
to old data and a subscript of 2 will refer to new data. The vector x^ is 
the orbital element vector after processing the old data and x^ is the 
orbital element vector after including the new data. A superscript of 
1 and 2 will refer to where a matrix or vector is evaluated. For example 
h^^^ means the residual vector for the new data evaluated at the old 
orbital element vector x^. 

After having processed the old data, one has an error 


E i - < a i < 1>>T h i W) 


If the old data had been processed exactly we would have = 0. 

In the sequential processing method, one computes a correction v 
by solving the system of equations 


(N. 


( 1 ) 


+ N, 


( 1 ) 


)v = - 


n - ) t 

<V > h 2 


( 1 ) 
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and letting 


x 2 = x 1 4* v . 


The main problem now is to evaluate 

T 


e 2 - (a i ( 2)) h i <2) + (a 2 (2)) h ; 


( 2 ) 


( 2 ) ( 2 ) 

Expanding h^ and in power series about the point x^ gives 

(2) _ h ( 1 ) j_ a CD 

‘1 " n i 


h 2 (2) = h 2 (1) + A 2 (1) v + e 2 . 


If the functions were linear, the errors ,, would be zero. Also, the 
errors 


P = A (2) - A (1) 
F 1 1 A 1 


p 2 - a 2 < 2 > - a/ 1 ’ 


would be zero if the functions were linear. 


E„ = 


+ ca/V h 2 < 2 > 


m T 

= (A 1 U; + P 1 ) (h x 

T 


(1 > + A, (1) v + 


O + (A„ (1) + P 0 ) T (h 0 (1) + A 0 (1) v + e 0 ) 




+ (A 2 (1) ) T h 2 (1) + N 2 (1) v + (A 2 (1) ) e 2 + P 2 T h 2 (2) 

E 1 + [(N i (1) + n 2 (2)) v + ( a 2 (1) ) T h 2 (1) l 
.(A ^) 1 e 1 + (A 2 (1) ) T e 2 + P 2 T h^ + p/ h^ 


( 1 ) 


( 1 ). 


T u (2) ^ D T u (2) 


E 0 = E, + (A/ x ') 6l + (A,'-') e 0 + P ± h ^ ' + P 0 h 


2 2 
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This formula indicates the growth of the error in the Kalman filter 
processing method. The added error in each step occurs because of the 
nonlinear terms reflected by ^2* iterating on the new data 

it is possible to get rid of the terms involving e ^ and ^2’ as will be dis- 
cussed in Section 6.) 

It is possible to use the mean value theorem to express the errors 
e^y P 2 terms secon d partial derivatives of the residual com- 

ponents with respect to components of x^. and then use the above formula 
to prove a rigorous theorem about the growth of the error in terms of 
bounds on these second partial derivatives. Since these bounds are usually 
not estimated easily it is felt that such a theorem would be of largely 
academic interest . 

5.6 DIVERGENCE DUE TO NUMERICAL ROUNDOFF ERROR 

It has been frequently observed that the sequence of computations 
called a Kalman filter sometimes produces numerically unstable results, 
even in the linear case. This can be caused by numerical roundoff error. 

In order to shed more light on this phenomenon, we will examine fairly 
completely the simplest non trivial problem in which a Kalman filter is 
applied. Even in this simple case, some of the results are fairly hard to 
obtain. It is hoped that a complete understanding of the simple case is 
useful in understanding the more general case. 

The results which will be shown are: (a) The "instability" is not 

really an instability in the usual sense. It is an increase in the variance 
and a bias in the mean of the estimate, and the increase and bias can be 
theoretically predicted. (b) The same phenomenon occurs by the "least 
squares" algorithm, but it shows up later. 


The case considered here is the simple problem of finding the mean 
of a set of number y. with mean y and standard deviation 1. 


1 k 
3=1 
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We will consider two algorithm for computing y^ sequentially: 

I : C = 0 
o 

S = 0 
o 


c k + i = c k + 1 


S k+1 S k + y k 


k = 0, 1, 


k+1 


k+1 C. 


'k+1 


II: 


V 1 = 1 


S 1 = y l 


V k+1 = V k ‘ v k (1 + V 


-1 


a k+l V k+1 S k+1 


These two methods are generally called the least squares method and 
the Kalman filter method in somewhat harder problems. (Actually method 2 
is the "Kalman filter method without shifting nominal". This fact 
does not affect the present argument.) 

If the arithmetic were done exactly, the results of both processes 
I and II would be the same. However, since the computations are done on a 
computer with finite word length, the results will be different. 

Consider a floating point machine with, say, 3 decimal digits. 
Suppose that is computed exactly. This is a reasonable assumption if 
the mean of the random numbers y^ is near zero; in this case there is no 
appreciable growth of roundoff error in the computation of S^. 

In method 1, there will be no error at first, and one will have 
= k. As soon as k = 1,000, however, the computation 1000 + 1 will 
result in 1000 on the 3 place computer. Successively adding 1 will not 
change the succeeding values of C^. Hence the computer will produce for 
method I 
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for k < 1000 


I: (computer) 



_1 

®k 1000 



for k > 1000 


The mean and standard deviation of a, can be easily computed 


a^ = y, = 1/y k , k < 1000 


- = _jl_ - = m 

a k 1000 y » °k 1000 


k > 1000 . 


This shows that the computer estimate of the mean will be biased and that 
the standard deviation of the estimate will grow for k £ 1000. 

An analysis of method II can be given in similar fashion. There is 

a difference in that the computations are not done exactly for early values 

of k; one can however show that the roundoff errors are stable. Hence 

assume that the computations in method II are carried out exactly for early 

value of k; v^ = — at first. The computations will break down when v^ + 1 

is indistinguishable from 1 on the computer. This happens when v, = .005 

2 k 

or k = 200. At this point also, v^_ is indistinguishable from v^ and we 
have = v^ for k > 200. Hence method II produces the result 


II: (computer) a, - 


1 k 

j-i 3 


a, = 


1 

500 


k 

z 

3=1 


k < 200 


k > 200 . 


The two methods are compared In Figures 5.1 and 5.2 with the results 
with no machine error. They illustrate that there is in fact a bias and 
and increasing standard deviation for both methods I and II. 
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One might object to the above simple case as being unrealistic for the 
reason that before the "instability” would show up on a realistic computer 
one would have to process an unrealistic number of data points; for ex- 
ample, 10^ data points on an IBM 7094, which would require about 10 ^ 
years. The problem would show up in a realistic time however, if one in- 
cludes an apriori first estimate of the mean with a small standard deviation 
When this happens, the above analysis still holds, except that the instab- 
ility occurs somewhat sooner. 



5.7 A THEOREM FUNDAMENTAL TO SEQUENTIAL DATA PROCESSING 


This Section concerns a method of obtaining and understanding a number 
of identities which are important in sequential orbit determination and 
other branches of applied mathematics. The idea is that a large number of 
results come from one simple theorem. It is hoped that this will provide 
some further insight into the nature of computations which are performed 
now and also suggest some new methods. 

Let m and n be two integers. While the theorem below’ does not depend 
on the relative size of m and n, the applications are generally for m > n. 
For example, in the orbit determination problem one typically has m = 6 , 
n = 1 . 

Let B, D, and A be fixed matrices. The matrix P is a variable matrix 
in the theorem below. The dimensions of these matrices are 

A: m x m 
B : m x n 
D: n x m 
P : n x n 

It is assumed that (DAB) ^ exists. 

Now define the m x m matrix V p as follows 

Definition: V p = I + B(DAB ) -1 (P - I) DA ( 5 , 5 ) 

Note that the computation of V p requires the inversion of an n x n matrix. 
The basic result is 


Theorem: V p V Q = V pQ 


(5.6) 


The theorem is easily verified by simply multiplying the two matrices V p , 
Vq and collecting terms. 


The importance of the theorem is that a certain subset of m x m 
matrices is isomorphic to the multiplicative group of n x n matrices. In 
other words, matrices which look like the matrix V p can be operated on 
(multiplication, inversion, square root extraction) by doing these operatior 
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on the smaller matrices P. Since the dimension of P is smaller than the 
dimension of a savings in computer time can result. This will be made 
clearer by the examples below. 

The following results can be obtained immediately from the theorem 


V 


I 


= I 


(5.7) 



(5.8) 



(5.9) 


Application 1: The Schur Identity 

As a first application, consider, the problem of finding the inverse 
of a matrix of the form W ^ + TUV. One can write 

-1 -1 -1 
(W + TUV) = W (I + TUVS ) 

= W Vp 1 

= W V (5.10) 

P 

To identify the matrix I + TUVS ^ with the matrix V^, we let 

T = B 

U = (DAB)' 1 (P - I) 

V = D 

W = A (5.11) 

The matrix P is hence computed from 

P = I + (VWT) U (5.12) 
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Now suppose that W is known and one needs the inverse of (W ^ 4- TUV) . 
One can obtain the inverse of this large (m x m) matrix by forming and in- 
verting the small (n x n) matrix P, finding V - and then W V - . 

P X P 

The above result is a disguised form of the well known matrix identity: 
(w -1 + TUV)" 1 = W - W T (U _1 + VWT) _1 VW (5.13) 

To prove this note that 


P 


-1 




-1 

+ VWT) 


(5.14) 


and 


-1 

V = I + T (VWT) (P - I) VW 
P 1 

= 1 - T (VWT) (P - I) P VW 

~ X -1 -1 _X 
= I - T(WT) (VWT) VU (U + VWT) VW 

-1 _X -1 

= I - T (U + VWT) VS (5.15) 

It then follows that W V - agrees with the right hand side of (5.13). 

P 

Application 2: Orbit Determination - Kalman Filter 

The matrix 

(N + g g')' 1 

appears in orbit determination theory; usually with N and 3 being 6x6 
and 6x1 matrices respectively. If one knows the inverse of N one can 
find the inverse of (N + 6 6 1 ) as in the last section. 
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-1 


-1 


(N + 3 3') = N (I + 6 B 


N h 


-1 


where 




V 

P 


-1 


N = A 
3 = B 
3 * * D 

P = 1 + 3' N _1 6 (scalar) . 


If one performs the algebra corresponding to equation (5.15) above, the 
result is the "Kalman filter'* equation- This is expected, since the Kalman 
filter equation is known to be a special case of the Schur identity. 

Application 3: Orbit Determination - Square Root Method 

Suppose that one knows a matrix R such that 

R* R = C 


and one wishes to find a matrix R^ so that 

(c" 1 + B 8')' 1 ■ Sj . 

This can be done as follows 

(C -1 + 3 3 ' ) _1 = R' (I + R 3 3' R') -1 R 
= R' V p R 


= R' 



R 


" ( V f 1/2 W <>2 » 
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Hence = v ^/2 R 

In the above derivation we set 

A = I 
B « R3 
D = 3 1 R f 

(DB) -1 (P - 1) = 1 
P = 1 + | R 6 | 2 . 

We also used the fact that V is symmetric. 

P ' 

Application 4: Variations (Partial Derivative) 

In orbit determination, one frequently has occasion to compute the 
derivatives of a function like 


f(x) = X I X I p . 

Where X is a vector; the variation of this expression is given by 

6f(x) = |X| P V p+1 <5X 


where 


V p+ 1 = I + P 


XX 


This expression is useful in deriving, for example Taylor series 
methods for integrating equations. It is also useful in deriving varia- 
tional equations. For example, if the equations of motion are 


X 


|x| 


3 


X = — y 



one can write the variational equations immediately 


(6X) 
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V_2 <5X 



6. ESTIMATION WITH A SLIGHT NON-LINEARITY 


6.1 INTRODUCTION 

The effect of non-linearities has been discussed throughout this 
Report. In general, little can be said about this source of error, or 
about any corrective approximation which might be introduced, unless the 
non-linearities are small. 

It will be assumed that the dynamics is noise-free so that only the 

data is contaminated by noise. The state variables x of the system will 

here be chosen to be certain constants of the motion, such as orbital 

elements or initial position and velocity. The only non-linearities arising 

are then those in the relations between the observations y. and the state x. 

i 

These non-linearities will be assumed small. We shall consider, then, the 
problem of estimating quantities x , given a set of observations 


y . = a. 
i ia 


+ e iaB 


+ n. 
i 


( 6 . 1 ) 


where the n^’s are independent standard normal variables, and summation is 
understood over repeated Greek indices. The coefficients of the non- 

linear terms are understood to be small, and their squares and products will 
be neglected throughout. It may be assumed that the apriori covariance A 
of the x^ is infinite; alternatively, apriori information may be included as 
a first data point. 

Three forms of the estimator will be analyzed and their biases compared. 
These will be (i) The least squares fit, (ii) The sequential (Kalman) estimate 
with linearization of the latest residual about the previous estimate, and 
(iii) An "iterated-sequential" scheme involving iteration through the latest 
data point designed to reduce the non-linearity in the latest residual. 

6.2 THE LEAST SQUARES FIT 


This estimator obtains that set of values which cause 


X>i - a ia x a - e ia3 x a x f3 )2 = 


mxmmum 


( 6 . 2 ) 


- 83 - 



Thus 


0 


? (a ia + Ze ia T i Y ) (y i ‘ a iP X P " e iP« % V " 0 (6-3) 


That is, if denotes 2>ia a ip > then 


N aP % = £ a ia y i - I>i 


ia E ipy + 2 ? £ iap % y i 

X 1 


2 £ a i|3 8 iav x 3 X Y ’ 


(6.4) 


In the right member of (6.4) (neglecting e ) , x^ is replaceable by 


(N) fta £ a ia y i 


Substituting for y. in terms of n. and x , the distribution of x - x 
° y x l a a a 

may be investigated. 

In particular: 


E | x a “ x aj = ~ ( S a i3 e iy6 ) (N) ap (N) yfi ’ ( 6 - 5 ) 

and 

E j (x a “ x a } (x (3 " Vl X } 

= ~2(N)^ (N)"l Z( a iy e i6e + a i(S e iYe ) x e (6.6) 

6.3 THE SEQUENTIAL FIT 

This estimate is obtained successively as follows: 

* (i) 

is that which causes the sum of previous squares, represented by 
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N 


(i-D 

a(3 


( X a " x a X l} j ( x p ~ x p X 1} ) + U) - minimum 4 (6.7) 


where the first term on the left represents the sum of squares of previous 
residuals, and where p^(x) is the i— residual linearized around i.e., 


P± (x) = y. - a ±a x a 


<i ‘ l) - B._. i (i ‘ : 


"lap A p 


i-D jCi-D _ ( 


a. + 2e. D x n 
ia iap p 


(i-l)' 


\ a a f 


( 6 . 8 ) 


and 


N 


(i) = N (i-D 


ap 


ap 


(■* 


+ la. + 2e. _ _ 
ia iay y 


(±-D) 


X- •)(. 


+ 2e x^ 1 1 ^' 
ip ip 6 $ 


(6.9) 


Thus 


“a^ ( i p 1) - i p i ' 1> ) = »i ( i(i_1) ) ( a ia * 2e iar ;< v' 1) ) ' < 6 ' 10 > 


Hence , 




0 li (i) - x 1 - (i (i - 1) - x ) 

3 \ X p P ) ap \ p P/ 


- ?i ( i<1 ' 1) ) ( a ia » 2e i« 4 i_1) ) 


(** 


+ la. + 2e. x^ 1 

ia iay y 


)( a i 




ip + 2e ip6 x r^ ) )( x p i_i; - x p) 


p • 


( 6 . 11 ) 


Substituting for y_^ into in terms of n^ and x, and summing over i from 
1 to m(i.e., m data points): 
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{ 4 m) - *») - J^i * e i P « (^ i ' 1) - x g) - x J] 


x a. + 2e. 

i~ 


- (i-l) 

ia -iaY *y 


( 6 . 12 ) 


In the right member of (6.12) the is replaceable by 


*P + 




-1 

Pe 


i-l 

£ n i a ie , where 

j=l J J 


^(i-D = y- a a 

pe jP je 


(6.13) 


A (ill) 

In order to investigate the distribution of - x^ it is necessary to 

observe that is itself a random variable. In fact, 

a (3 


N 


(m) _ 

a(3 


+ 2 

aP 


m / 

S (■. 


, .(i-l) 

ia e ipY + a ip S iay) X Y 


lay) 


(6.14) 


so that 


[* W C = £(Vi 


c + a. c s. 

10 8 16 iye 


:) 4 i ' 1) . 


(6.15) 


We can now obtain: 


- (m) 

x a " x a 


[ H<m) ]ag [ n i a i P + 2 n i £ i 


C-(^- i)+ a . 0 £. . 
ipy X Y *2 ^ 


' 0 
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X 



)] 


- * [«"]:* MZ\£ p it n o **). 


(6.16) 


On the right of (6.16) we may use 


jU-D , x . 
y y 


(m) 


-1 i-1 


[**']« £ * 




(6.17) 


The distribution of x - x is now available. The covariance 

a a 


*„)| *! 1 


is the same as that for the least-squares 


fit to the same data, so that N is identical with The bias, however, 

is more serious: 



(6.18) 

6.4 THE ITERATED-SEQUENTIAL FIT 

This estimator is obtained as follows: 

A ( x) 

is that x^, obtainable as the limit in an iterative procedure, which 

causes 
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k - $**>) k - ^ (1 - 1) ) 


+ (x) = minimum 


(6,19) 


where p*(x) is the i— residual linearized not around x*^ ^ but around 
1 

the desired x , and 


= Mf ^ ^ + (a + 2 s / a + 2 s //- 20 } 

a(3 V \ a ia + ^ e ia Y X Y / \ ±P i|35 X 5 / # (6 ‘ 20) 


=• i*a)\ / a 

"ia-Y X Y / ( i[3 


-*(i-l) 

The iterative procedure consists of linearizing p_^ firstly around x 5 

-*(i) 1 

obtaining from the minimization a first estimate x , re-linearizing 

-*(i)l -*(i)2 2 

around x to obtain a second estimate x , etc. Since e is negli- 

gible, the next iteration is optional, and any further iteration pointless. 
We see that 


*(i-l) ( _ ^*(1-1) 
N ap \ x p X P 


- p i ( i(i) ) ( a ia * 2e iap Y (l) 


) • 


(6.21) 


and hence 


/(i) (f( i) . x \ N *a-i) /-*d-D _ x \ 

"ap (p p/ ap \ p p) 


P) ( 


a. + 2 e. x 
ia my y 




+ (a. + 2e + 

\ ia ia Y y / 1 ig 


)k t2e iP6 i * (i) )k (i) - y) 


( 6 . 22 ) 


Substituting for y_^ into p_^ in terms of n^ and x, and summing over i from 
1 to m. 
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"ip 6 






X 



+2e. x 
icxy y 


' J 


(6.23) 


On the right of (6.23) we may use 



(6.24) 


Again it is necessary to 


pre-multiply by the inverse of N 


*(m) 


given by: 






. e. „ 

iy e 


+ a. „ 
10 




(6.25) 


The distribution of x 


*(m) 


x is then available. The covariance 

a 


E 



is again the same as before. In computing the bias E <x 


(m) 


v ~*(i) 

account must be taken of the correlation between n. and x 

i Y 


result is: 



The 
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It is noteworthy that in the sequential estimates of Sections 6,3 and 6.4 
the early non-linearities have much greater effect than the later ones, 

N v J . The iterated-sequential 


due to the presence of 


t/ 1 ^ 


-1 


or 


fit, moreover, replaces by which by itself would be an improve- 

ment, but introduces also a multiplicative factor 3, which is adverse. 


6.5 AN EXAMPLE 


Suppose that we are making repeated measurements of a single quantity 

x with varying non-linear coefficients and that we have an apriori 

- 2 1 
estimate x of x with variance a . Here we put a - — , e - o, a - 1 
o o o o 1 

(i 1 1) . The least-square bias is 


E x - x 




(6.27) 


The sequential bias is 


E 





m 


E 

i=l 



m 

4 E 


i=l 


s . 
1 





( 6 . 
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and the iterated-sequential bias is 


— , -^(m) 

E x - x 


m 

- 


(6.29) 


^ A ftn) 

Except for the first non-linearity, whose effect in x is less than in 
x^ m \ at least if o q is large (i.e. > > 1) , the early non-linearities, 

~ * (m) . ^ (m) 

which have the larger effect, are more serious in x than xn x , 

whereas the most recent non-linearities affect x about three times as 

This rather paradoxical feature 
is not apparent in the simpler two-batch analysis in chapter 5 (see 
1st paragraph on p.73). In the case of equal biases (e^ m e) and 
large m, we have: 


* A *(m) 

much as they affect either x or x 


E x - x 


_e 

m 


E |x (m) - x 


— In m 
m 


(6.30) 


and 


E {x - xj — ^ In m 
m 


(6.31) 


It is tempting to conclude that a sequential fitting procedure should be 
iterated only at the outset. (!) 

6.6 ANOTHER SEQUENTIAL PROCEDURE 

2 

An estimate equivalent (for negligible e ) to the least-square fit is 
available if third as well as second degree terms are carried forward in 
the expression representing the sum of the previous squares. This 
"higher-order sequential estimate 11 x+ (i) is defined as that x which minimizes 


-91- 

I 

I 

<, 

i 



(6.32) 


Pi 2 * Qi.i W - Ng 1 - 1 ’ (x a - if 1 - 1 )) (x p - if' 1 - 1 )) 


3 - 4 <1 - 1) ) k - 4 (i - 1) ) K - 4 <1 - 1) ) * p 2 


^ it ^ i — 

Again, this is obtained by first linearizing p. around x 1 , expanding 

~ (i-1) 1 

Qi_l (x) to 2nd order in x - x v , obtaining thus a first estimate 
xf (i) l, 


it<«2 


( x ) an ^ are then re-expanded around to obtain 

, etc., and a further iteration is optional. The next and E are: 


N 


;Mi) _ . E (i-1) UHD . 

afy \ 


a.p 


ag 


Y 


Y 


+ la. + 
I la 


2 e. i^i) 
iay y 


)( 


a. ft + 2 e 
ip ip{ 5 


(6.33) 


and 


( a 


= E^ 1 ' ) + 2 (a. e + a. Q e. + a. e 
aPy aPy V ia xPy xp xay iy 


iap) 


(6.34) 


This improved sequential estimate required, of course, not only more 
storage but the calculation of all the second partial derivatives 

6.7 AN APPLICATION 

As a possible application of this analysis, we shall develop an 
algorithm for computing the effect of non-linearity in the estimate of 
position of an incoming vehicle (see Figure 6.1) based only on range 
measurements from a tracking station in the plane of motion. 
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Figure 6.1 Tracking Geometry 


Linearize 
indicated first 


the orbit description about the circular orbit through 
observed position, i.e.. 


- b cos <|) 

(6.35) 

= b sin <)> 

(6.36) 


where (x,y) is a rectangular system moving on the nominal circular orbit, 
b is the magnitude of the initial position error, which may be assumed 
to be predominately perpendicular to the line of sight since the initial 

range measurement itself should be fairly accurate. Thus <J> s 0 , and 

• • 

the 3 unknowns b, x^, y Q may be assumed as zero mean with known variances 
and zero correlation. Define the departure from the nominal orbit, 
ignoring the initial range error, by means of 


\ 


1 

1 


“ b “ 



X 

X 2 

— 

o 

n 



to 

3 J 


-_n . 


- 93 - 


(6.37) 


where n is mean motion on reference circular orbit. 
Then: 


(:) ■ ■■«(?' 


(6.38) 


where the 2x3 matrix B is 


B(e) = 


COS i>(u - 3 COS (v e) ) I sin(0 o -e ) 


^sin fa + 6 cos ^(sin(e o -e)-e o+ e) -2(i-cos(e o -e)) 


2(l-cos(0 o -9 )) 


4 sin(9 -e)-36 +3e> 


(6.39) 


The Measured range is 


V 2 2 

(R 0 + H + x - R 0 cos 0) + (Rg sin 0 - y) + random error (6.40) 


where the random error has variance o , and 

P 


R^ + H + x - R~ cos 0 


p x = 


(6.41) 


p y 


-R 0 sin 0 + y 


(6.42) 


(R 0 sin 0 - jY 


xx 


(6.43) 
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( 6 . 44 ) 


(R 0 + H + x - R 0 cos qY 


yy 


(R 0 sin 0 - y)(R 0 + M + x - R 0 cos 9) 


*y 


( 6 . 45 ) 


We assume that the non-linearities in expression for p in terms of A^ 
arise predominantly from non-linearities in p as a function of (x,y) rather 
than in (x,y) as functions of A_^. [These latter non-linearities have 
already been ignored in writing equation ( 6 * 38 )] 

Then 


where 


(°\ q ) + p'(e)B(e)x + | x T B T (e)p"(e)B(e)\ 


p 

[x=y=0] 


1x3 


3x3 


( 6 . 46 ) 


X = 



p'(e) = [p , p ] 
y 


, p"(e) = 


x=y=0 


XX 


xy 




yyJ 


x^r=0 

( 6 . 47 ) 


The previous analysis now applies (A here is x there) , where 


B(»j) 

a = 1,2,3 1x3 


( 6 . 48 ) 
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(6.49) 


- 2 ^ B^(e ± ) P"<Oi) 8(6^) 
a 3 p = 1,2,3 3X3 


We may suppose that the 1st, 2nd, .... observations are counted by 
i = 4, 5, ... and reserve i « 1, 2, 3 for apriori variances: 




V 


CL, 


2a 


a, 


'3a 




mean motion 




X 


^ e ia3 


= 0 for i=l,2,3. 


(°- °> J~ ) 


We can now proceed to compute the biases in 




(6.50) 


(6.51) 

(6.52) 


(i) according to (6.5), for the least-squares fit, 

(ii) according to (6.18) and (6.26) for the sequential fits. 

Finally the biases in extrapolated in-plane position, at say 0=0 , 
are given by: 



(6.53) 
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7. CONSIDERATION OF SYSTEMATIC ERRORS 


7 . 1 INTRODUCTION 

As a practical matter it is usually not feasible to estimate all of 
the orbit parameters if the dimension of the state vector is high. For 
example, it is obviously impossible to estimate all of the coefficients of 
the spherical harmonics describing the gravitational potential of the 
central body, even though any or all of them might have a significant effect 
upon the solution of the orbit determination problem. Ideally, one would 
like to apply the method described in Ref erence [ 7 ]to determine those linear 
combinations of parameters which affect the data weakly so that they can be 
deleted from consideration. In practice, this is done on some intuitive 
basis, and perhaps some significant parameters are left out. These unesti- 
mated parameters are called systematic errors. 

In the presence of systematic errors one sometimes resorts to a 
"consider option," which can take on several forms: (1) weighted least 

squares - estimate the desired parameters as if the systematic errors were 
not present, where the inverse data noise covariance is the weighting 
matrix, but reflect their contribution to the estimation error in the 
calculation of the estimation error covariance matrix. (2) minimum 
variance - reflect the presence of the systematic errors in the estimate 
of the desired orbit parameters and in the calculation of the estimation 
error covariance matrix, in such a way as to obtain the minimum error in 
the estimated parameters. This approach is equivalent to estimating all 
the orbit parameters, including the systematic errors, (3) general form - 
choose an arbitrary form of the estimator which is computationally 
convenient and produces an acceptably small error covariance matrix in the 
presence of systematic errors. The general form of course includes (1) 
and (2) as special cases. 

It is the purpose of this Section to develop a collection of methods 
for obtaining estimation error covariance matrices when the general form 
of the consider option is employed. A linearized relation between the 
data and state will be assumed, which, from the point of view of the non- 
linear theory of Section 3, can be thought of as a linear expansion about 
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the converged modal trajectory. It will be further assumed that no state 
noise is present, but this case will be treated in Section 4. 

7.2 THE CONSIDER OPTION FOR A SINGLE DATA BATCH 

Suppose the vector z of observation residuals is given by 

z = Ax + Bs + n (7.1) 

where x, s, and n have the usual meanings, i.e. x is the vector to be esti- 
mated, s is the vector of system parameter errors, and n is the random 
vector of observation noise. It will be assumed in Subsections 7.2 and 7.3 
that the apriori variance of x is infinite (A ^ = 0) . A linear estimator 
L converts z into an estimate x of x : x = Lz. In view of (7.1) this 
relation becomes 


x = LAx + LBs + Ln (7.2) 

The estimator L will be called unbiased if LA = I. This property defines 
the general form of the consider option, and will be assumed throughout. 
Hence 


x - x = LBs 4- Ln 

= Cs + Dn (7.3) 

There are two interpretations of the vector s. It may be regarded as 

a constant but unknown vector. We know only that the best estimate s of s 

is s = 0, and that our confidence in this estimate is given by a covariance 

- T 

matrix £ . We may then define E(s) » s = 0, and E(ss ) = E . With these 
s s 

conventions s may be handled mathematically as if it were a random vector 
with mean 0 and covariance E g . This is the second interpretation. In 
either case, (7.3) implies 

E = cov(x - x) = DI n D T + CE g C T (7.4) 

T T 

where V = E(nn ) and it is assumed that E(sn ) = 0. 
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Two well known estimators are the following: 

Weighted Least Squares Estimator This tried and true estimator is given by 


t _ /A T--1 A .-1 A T_-1 
L _ — (A £ A) A £ 

LS n n 


The corresponding estimate will be denoted by x^g ; the covariance matrix 
(7.4) by Z Lg . 

Minimum Variance Estimator This estimator provides the minimum covariance 
matrix (4.4). It is given by 

= (A T WA) _1 A T W , 

-1 T 

where W = V + B£ B . This is easily shown: Any other estimator may be 

S T -1 T 

expressed as L = K + (A WA) A W. Then LA = I implies KA = 0, and it 
follows that (7.4) reduces to 

kw -1 k t + (a t wa) -1 . 

T -1 

This is a minimum for K = 0. It follows that £^ = (A WA) 

It is usually impractical to use in making estimates of x alone, 

since the calculations required are equivalent to those required to estimate 
s in addition to x. The matrix £^ is calculated by using a weighted least 
squares estimator to estimate the vector (*) from the data 


(:.)■(: :)(:) *(:.)• 


where E(n n ) = £ . The estimator in this case is 


s s 


"LS 


T -1 -i-l T -1 

/A B\ /T 0 \ /A B\ /A B\ /T 0 v 

lo J \ o z-V lo J (o i) (o z" 1 ) 
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The resulting covariance matrix (7.4) is 


'A T r 1 a a t t 1 b 


-l 


T -1 T -1 -1 

b r a b r b + z 

s 


The upper left portion of this matrix, which is just cov(x - x) , is equal 
to as may be shown using the Schur identity. 

The main interest in is that it gives a lower bound for the 

MV 

obtainable tracking accuracy. 

7.3 THE CONSIDER OPTION FOR TWO DATA BATCHES 

Suppose two tracking runs result in observations 


= A^x + B^s + n^ 

z 2 = A^x + B^s + n^ . (7.5) 


Suppose linear estimators and are applied to give estimates 

X 1 = ^1^1 X + ^1^1 S + L l n l 
= x 4- C^s + 

*2 = L 2 A 2 x + L 2 B 2 s + L 2 n 2 
= X 4- C^s + D^n^ 

It is desired to combine these estimates to give an estimate x^ which is 
better than either and to compute the covariance of the error x^ - x. 
Assuming a linear unbiased combination, all schemes may be written as 

= Mx^ + Nx^ , (7.6) 


where M + N = I 
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Thus , 


x + 


(MC + NC 2 )s + (MD 1 , 



(7.7) 


and 


Z^ = cov(x^ - x) = 


T T 

MD 1 I^Dpi + 


T T 

nd 2 r 2 D 2 N 


+ (MC 1 + NC 2 ) 2 g (MC 1 + NC 2 ) T 


= (M,N) 




(7.8) 


where 

T = E(n.nT), Z. = Z. + Z. = D. T . dT + C.Z cT, i = 1, 2, and 6 
i 1 1 1 m ic ill is 1 * * * 

These definitions will be used frequently in what follows. 


C-E C 
1 s 


T 

2 ’ 


The foregoing is quite straight forward but it emphasizes the fact 
that, once the matrix M is selected (and hence also N = I - M) » the 
resulting covariance matrix (7.8) is determined and needs merely to be 
calculated. Looking at the matter differently, whenever a matrix E^ is 
purported to be the covariance matrix of a combined estimate, there must 
be a corresponding choice of M related to E^ by (7.8). Any deviation from 
this rule may be regarded with suspicion unless accompanied with an estimate 
of the difference between E ^ and a covariance matrix calculated from (7.8). 

Two combinative procedures will be discussed. 


Weighted Least Squares Combination Using the notation of (7.5), this 
method consists of letting 




(A 1 W 1 A 1 + A 2 W 2 A 2 ) 


-1 


< a X a i> 


"ls - (A l"l A l + 


aJj 2 a 2 ) 


-1 


(A 2 w 2 A 2 ) 
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where = P^ \ ^and it is assumed that and are independent. 


If x^ and were obtained using the weighted least squares estimator de- 
scribed in the previous section, then this choice of M and N results in a 
simple combination of data batches 1 and 2 into a larger batch which is 
then processed with a weighted least squares estimator. In other words. 


the tracking matrices 
tracking matrix 


T * T 

[A.W.A. i A.W.B.], i = 1, 2 are combined to form the 

ill) l i 5 


I A X A 1 + A ^2 A 2 ! A K B 1 


+ a 2 m 2 b 2 ] 


A A 

This is no longer valid if and x^ were formed in any other way. 

Minimum Variance Combination If the estimators and are fixed, then 
how can M be chosen to obtain the minimum covariance, of x^ - x ? 

T 

Assuming E(n^n^) = 0, the answer is given by 






T -1 
0 ) 



(£ 1 


0 ) 


-1 


n mv " (£ i ' e)_1 + 


T -1 

(E 2 - e T ) 


i-i 


T -1 

a 2 - e T ) 


where 1^, and 9 are as given by (7.8). This is easily shown by re- 

placing N by I - M in (7.8), expanding in M and completing the square. 
The resulting covariance is 


e mv (m mv’ n mv ) 


£ 1 6 


01 V V n mv 



T 


-1 


- z 2 - + h - (e + 9 >> 
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Any other choice of M results in a covariance matrix E^ where 

E 3 “ Z MV = (M " V (E 1 + S 2 " (9 + ° T))(M ~ V 1 ’ 

Although the minimum variance estimator of Section 1 is difficult to 

calculate, the choice M = is easily calculated if the matrices C_^ , D^, 

E g and are available. These in turn depend upon the estimators 

and L^ 9 which may well be taken as least squares estimators. In this case 
the C and D matrices may be readily computed. 

7.4 THE INFLUENCE OF APRIORI INFORMATION 

Suppose the estimate x^ is available along with a covariance matrix 
E^, the M a priori" covariance, but that the dependence of E^ on and 
r is not known. We may assume that 

= x 4- C^s + n^ , 

but the matrices C and D are not known. They may be regarded as variable 

-L t m 

but constrained so that C_E C n + D- F D_ = E_ . 

1 s 1 111 1 

It is often stated that the covariance of the combined estimate is 
given by ^ which is computed from the augmented tracking matrix 

tA 2 W 2 A 2 + "i 1 | A 2 W 2 B 2 1 

in the same manner employed for a weighted least squares estimator. This 
is a dependable covariance matrix corresponding to the combinative pro- 
cedure M = + E^) lf_ C^ = 0. If sf 0, the matrix E^ 

is not correctly calculated. It is the purpose here to investigate the 
magnitude of the error. 

Consider the combinative method given by 

M = (A^W^ + z" 1 ) z" 1 . 
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Let be the corresponding covariance matrix computed from (7.8). 
The following formulas are easily established: 



(7.9) 


hence 


t 1 T T 

E 3 - Z 3 - M0N +N0 M . 

It is seen that this error will be small compared to the size of 0 if 

0 or N ^ I, i.e. if the noise errors are much smaller than the system 
parametric errors. But 0 is about the same size as the matrices E^ and 

E^, hence, in this case E 3 is a reasonable approximation of 

Since the pseudo covariance matrix E 3 could be unrealistically small, 
it would be useful to know under what circumstances it is at least greater 
than the minimum variance covariance matrix E^. Equation (7.9) shows that 

E 3 - e mv ■ ME / + n£ 2 nT - Vli 

" f ’W eN MV " n mv 9 m mv + h mv e 2 n mv 

- V"MV + K(E 2C - <E 1 + E 2n )) “' r ' 
where Q = (E^ - 0) + (E^ - 0^) . 
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Otherwise the situation is 


It is clear that Z 0 > Z if Z on > Z 4- Z . 

3 MV 2C “ 1 2n 

not so clear. 

A possible approach to establishing that Z^ £ Z^ would be to show 

the existence of matrices M and N = I - M such that Z^ may be computed from 

(7.8) using M and N. However it can be shown that a necessary and suffi- 

cient condition for the existence of such matrices is Z„ ^ Z^. This 

3 MV 

approach then is not helpful. 

7.5 A SIMPLIFIED ANALYSIS OF STEADY STATE ORBIT DETERMINATION 

Suppose we have a steady state (constant geometry) orbit determination 
situation where the state of the system can be estimated quite well during 
a short tracking interval, but where the effects of state noise and unmod- 
elled parameters add disturbance between tracking intervals. In order to 

optimally weight the data and to describe the tracking error, it is 

necessary to determine a realistic error covariance matrix which is charac- 
teristic of the operational tracking system. 

Assuming one continuously estimates some limited number of system 

parameters, ignoring state noise and unmodelled parameters, the hypothe- 

T -1 

tical error variance (A WA) would go to zero as time goes to infinity, 
(Figure 7.1 - curve 1). Consideration of these unmodelled effects in the 
calculation would yield an error variance (7.4) which becomes unacceptably 
large with time, (Figure 7.1- curve 2). Neither of these calculations is 
meaningful, however, for in an operational system one would de-weight early 
data to reflect the information loss due to unmodelled effects, (Figure 7.1 - 
curve 3). For example, previous information might be completely ignored by 
using only the most recent tracking interval to estimate the state. Such 
an approach is a special case of the data weighting methods discussed above, 
where the covariance matrix describing prior information is degraded in 
some less arbitrary fashion. 

Theoretically, de-weighting prior information corresponds to postu- 
lating systematic errors acting between tracking intervals. Thus the most 
rational way to treat the problem is to determine the statistics of this 
disturbance, and weight the data according to the true error covariance. 

A broken curve will result since the tracking is intermittent (Figure 7.1 - 
curve 4). The necessary calculations can be carried out with existing 
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Suppose the unmodelled parameters and state noise act only between 
tracking intervals to perturb the state by an amount Ax. Then let 


X. 

1 

= constant 

(7.10) 

x+1 

= x. + Ax 
i 

(7.11) 

z , 
1 

= A x. + n. 

l l 

(7.12) 


where x is the state vector, i denotes the ith tracking interval, Ax is the 
state disturbance, n^ is white data noise, z^ is data. The A is a (constant) 
matrix, which is supposed to be of full rank. Let the variance of Ax be 


E [Ax ] = R(At) 


where At is the time between tracking intervals. Thus, 


At At 


K -/ /[tflM [§ 

0 0 l J L J L 


dx ds 


(7.13) 


(7.14) 


where q is the perturbation acting in At and Q(t,s) is its autocorrelation 

function. Let P be the "steady state" covariance matrix, obtained at the 

end of a tracking interval (Figure 7.1). The uncertainty at the beginning 

of a tracking interval is P + AR, and the information added by a tracking 
T 

interval is N = (A WA) , where W is the inverse data noise variance. The P 
matrix is obtained from 


The solution of (7.15) is 



(7.15) 


P = 


|l + 4 N 




(7.16) 
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If the norm of 



is small, we have 


P = 



= N 


-1 


- N 1 R 1 N 1 


The estimate at the end of a tracking interval is 


(7.17) 



A T Wz. + 
1 


(p + p)- 1 (<) 


(7.18) 


where [+,-] refers, respectively, to the end and beginning of the interval. 
Thus (7.17) describes the desired steady state error covariance matrix, and 
(7.18) shows how to optimally combine the most recent data with prior infor- 
mation. 7 These results can be easily extended to the case where the normal 
matrix N depends upon i, or where the unmodelled parameters and state noise 
act during the tracking interval. 

The analysis developed here could be used in the following way: 

a. define the disturbance covariance R(At) from a (perhaps 
pessimistic) analysis of the state noise and unmodelled 
parameters . 

b. define an acceptable value of R(At) and determine the 
time between tracking intervals to achieve that value. 

c. given the data z in any tracking interval, find the 
estimate at the end of the tracking interval by (7.18). 

Note that only the P matrix is needed for error analysis purposes and for 
definition of the tracking pattern. The validity of this approach could 
be checked by a Monte Carlo simulation with equation (7.18) introduced as 
the "fit world" estimator. 


^From (7.17) and (7.18) it can be seen that prior information is ignored 
only if R = infinity, in which case P = N and (P+R) ^ = 0. 
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8. TREATMENT OF CORRELATED DATA 


8.1 INTRODUCTION 

The weighted least squares (WLS) form of the maximum likelihood esti- 
mator can treat correlated data by employing the weighting matrix W = T \ 

T 

where T is the data noise covariance matrix T = E[nn ]. This is a large 
matrix, with dimension equal to the total number of data points, and in 
general it is not practical to invert and store T. 

In certain special cases the inverse is easily obtained. For example, 
if the data is uniformly spaced At seconds apart and is exponentially 
correlated, we have 

E[n(t )n(t ± )] = a 2 exp (- a|t - t.|)= a 2 (8.1) 

2 ]_ 

where a is the (stationary) variance, — is the correlation time constant, 

a 

and, 


y = exp (- a At) 


Then 



( 8 . 2 ) 
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Hill Mill III IIIIH ■mi 


iiiii ii iiiii mi iiiii ■■■iii 



and 


r 1 = w = 


2,. 2. 

a (1-y ) 


-y 0 0 


-Y (Y +1) -Y 0 


o -y (y +i) -y 


o -y (y 2 +i) -y 

0 -Y 1 _ 


(8.3) 


This weighting matrix causes adjacent data points to be differenced, as can 
be seen by performing the multiplication Wz, where z is the data vector. 
This approach is equivalent to adding a new state variable which is the 
correlated noise, as is sometimes done in Kalman filtering. Essentially, 
the inverse (8.3) can be easily found because (8.1) describes a first order 
Markoff process. 


The treatment of more general forms of data correlation is not so 
straightforward. In this Section we will develop a practical method of 
processing exponential-cosine correlated data by a differencing method, 
and it will be shown how a general data differencing technique can be used 
to treat other types of correlation. It will also be shown how to trans- 
form the problem to that of estimating an unknown acceleration. As before, 
a linear system with no state noise will be assumed, which corresponds to 
a linear expansion about the modal trajectory. 

8.2 STATEMENT OF THE PROBLEM 


Consider the problem of estimating the initial state vector of an 

orbiting vehicle from a linear combination of noisy measurements. Let the 

equation relating the measurement vector z, to the initial state vector 

x , be 
o 


z = Ax + n 
o 


(8.4) 


where A = 9z/9x and n is the vector of zero-mean random noise on the 
o 

measurement. Then, the well known weighted least squares estimate of x^ is 
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(8.5) 


x = (A T WA) A T Wz 

O 

and the covariance of the estimate is 

l WLS = (A T WA) A T WrWA(A T WA) (8.6) 

T 

where W is a diagonal weighting matrix and r = E[nn ] is the covariance of 
the noise. If W ^ = r, then the estimate is a minimum variance one and 
(8.6) becomes 


"MV 


= (A T WA) 


-1 


T -1 

= (Ar X A) 


-1 


(8.7) 


When T is not a diagonal matrix, i.e., the data are correlated, then 
the minimum variance problem can be transformed to the WLS problem if a 
matrix S = [s. .] can be found such that 


T -1 

s ws = r 

In this case, the estimate of x becomes 
* o 

X Q = (B T WB) B T Wq (8.8) 

where B = SA and q = Sz. Because most orbit determination programs do not 
save more than one row of the A-matrix at a time in solving the WLS problem, 
transforming the minimum variance problem to the WLS problem as indicated 
above is only feasible if a small number of rows of the A matrix must be 
saved. That is, only a small number of elements of each row of S are non- 
zero. Section 8.1 has shown how to transform exponentially correlated 
data into uncorrelated data by this data differencing technique and the 
following sections will show how to treat data with an exponential cosine 
correlation by data differencing. The inverse problem of determining what 
type of correlation can be handled by differencing a given number of 
measurements will also be examined. 
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An alternative to the above approach is to augment the state vector 
with noise parameters which are perturbed by white noise, and solve for 
this new state vector in the presence of an "unknown acceleration," (see 
Section 4). This approach, which is analogous to the way correlated 
measurement noise is handled in the Kalman filter, is discussed in 8.5. 

8.3 OPTIMAL PROCESSING OF DATA WITH EXPONENTIAL COSINE CORRELATION 

8.3.1 The Noise Model 

Let the autocorrelation function for the measurement noise have the 

form 


R. . - E[n.n. ] 
1 3 i 3 


R. . = (cos gx . . + y sin gx . . ) exp (- ax . . ) 
ij ij iJ ij 


(8.9) 


where t.. = It. - t.| = x.. , n. = n(t.) = the ith component of the vector 
ij 1 i J 1 Ji i i 

n, and a, g, and y are constants. This is a realistic noise model since, 
for y = a/g, (8.9) represents the autocorrelation function of the output of 
a linear, second-order system forced by white noise. The equation of 
motion for such a system is 

ii + 2a n + (a^ + g^)n = w(t) (8.10) 


where w(t) is zero-mean, gaussian white noise with variance a . The auto- 
correlation function of the steady-state response of this system is given 
2 2 2 

by Eqn. (8.9) if a = 4a(a + g ). This will also be true for the transient 
w 

response if the statistics on the initial conditions are 

E(n ) = E(n ) = 0 
o o 

E(nJ) - 1 

E(A = a 2 + B 2 
o 

E(n n ) =0 
o o 
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8.3.2 Solution For Equally Spaced Data 


For equally spaced data, the solution of Eqn. (8.10) obeys the diff- 
erence equation 


n 


i+2 


C l n i+1 


c n. = u. 
ox x 


( 8 . 11 ) 


where 


t ■ *i - 'i-i > 


-2ax 

c o = e 


O “ aT o 

c ^ = 2e cos 3 t , 


-2ax 


u . = 


1 3 


/ i+2x 

w(s) sin g(t i + 2x-s)e a ^ t i 


L t i+x 


ds 


-/ 


h+x 


w(s) sin B(t^-s)e S ^ds 


( 8 . 12 ) 


The autocorrelation function of u is then 


E[u lU ] 



if | i- j | > 1 
if |i-j| = 1 

if i-j = 0 


where a 


2 

o 


1 - 


-4ax 

e 



sin 8x cos 


3x 


(8.13) 


= 2e 


-2ax 


sinh 2ax - — sin 23 x 


(8.14) 
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(8.15) 


-Ml 


sin gx cosh ax - cos gx sinh ax 


]/ 


2ax 2 
e a 

o 


The new variable u. is only correlated with u. ^ and u. tn9 its nearest 

1 J l-l i+l 5 

2 

neighbors. An uncorrelated set of variables with variance o q can now be 
formed from the {uA by letting 


v. 

l 


a. 

i 


(u i - a i v i-i } 

P , i 


4 (8.16) 

> 1 (8.17) 


where v, = u, 


= 0 


For N large, then the sequence of defined in (8.17) converges to the 
value 


a°° 


1 

2 



l/2i 


1/2 


sign(p) 


N » 1 


(8.18) 


Notice that a real solution of (8.18) only exists for |p|< However, 
from (8.15) this can be shown to hold for all positive values of a, g, 
and x. 

Combining the above results with Eqn. (8.4), denoting the ith row 
of the A matrix as A ^ , yields 



q i = 

B. x 
1 o 

+ v ± , i = 1, 

y 

2 9 • • • 5 

N - .2 

where 

q i = 

(Lz . 

l 

- 

/ 

2 

" a i ’ 



B. = 
1 

(LA. ■ 

l 

- ._,) /A 

2 

" a i » 

i > 1 , 


B l = 

LA i 

• 
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L is a lineaj: difference operator defined by 


Lz. = z., 0 - c-z.,- + c z. 
x i+2 1 l+l o i 

rn rp rp rn rp rp rp rp rp 

Thus, letting B = [B^, B 2> B N _ 2 ] , q = [q 1> q 2 , q N _ 2 ] 

-I 2 

and W = o q I, Eqn. (8.8) can be used to obtain a WLS estimate from the 

T T 

new data vector q. Note that the calculation of B WB and B Wq by summing 

T T 

the individual elements B^W^B^ and ^.-W^q^ requires the temporary storage 

of A ±> A ±+1 , A ±+2 , z ± , z ±+ 1> z i+ 2 » A 1 , B i _ 1> and q^, in addition to the 

constants c^, and c^. Also, note that there are only (N-2) of the new 
uncorrelated observations as compared to the N original observations; no 
practical way of forming N uncorrelated observations was found for data 
with exponential cosine correlation. In this case, the covariance of the 
estimate is 

T -1 
I = (B WB) 

£ = (A T S T (Sr.S T ) _1 SA) _1 (8.19) 

where S is an (N-2) x N transformation matrix. If S were invertable, then 
(8.19) would reduce to Eqn. (8.7) and the estimate of x becomes a minimum 
variance one. However, further study is required to determine the conditions 
under which £ closely approximates £ ^ . 

The above method assumes the data are equally spaced. If they are 
not, then, in general the computation of q^ and requires the storage of 
all data prior to z^ +2 an ^ their associated A^'s. This would require an 
extraordinary amount of storage and is impractical for use in a computer 
program. Since most real data will not be equally-spaced, due to data 
editing or missing points, practical implementation of the method des- 
cribed in this section requires forming 'Mummy* 1 observations at the missing 
data points. These "dummy" observations could be computed by interpolating 
between adjacent observations. 
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8.3.3 A Simple Example 

To illustrate the method described in Section 3.2 and compare it to 
the WLS technique, consider the problem of estimating a constant scalar 
parameter. Then the measurement equation (Eqn. (8.4)) is 

z = x q + n (8.20) 

and the variance of the WLS estimate is 


Z WLS 


h " 

N i,j=l 


R. . 
ij 


( 8 . 21 ) 


If R_ is given by Eqn. (8.9) and the measurements are equally spaced, then 
(8.21) becomes 


E 

WLS 


< (l+2p) 
o 


nb: 


, 0 -2aT ax a . -ax 0 . 

+ 2e p(l-e -g sin gx - e cos gx) 

, 2 -2ax -Nax 1 / 2 2 /Q 00 x 

+ pa e + e C__ /N B- (8.22) 

O IN J/ 1 


— ry.T — 2 fY, T 

where = 1 - 2e cos gx + e , 


(8.23) 


= e aT ^cos(N+l)gx + sin (N+l)gxj - 2 


a 


cosNgx + — sinNgxJ 


+ e 


-ax 


cos (N-l)gx + J sin (N-1)3tJ , 


(8.24) 


and a and p are defined in Eqns. (8.14) and (8.15), respectively, 
o 
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On the other hand, forming the new uncorrelated data type using the 
technique of Section 8.3.2 transforms the measurement equation to 


q = Bx + v 
o 


(8.25) 


and the variance of the estimate is 


£ = (B T WB) 1 


= a 


N-2 , 

£ Bl 

i=l ] 


(8.26) 


where 



. . , N-2, 


a l = 0 


(8.27) 

(8.28) 


For NB., large E TTTO becomes 
1 WLb 

£ WL s = a o (1 + 2p)/ NB 1 ’ NB 1 1 (8.29) 

Approximating a_^ in Eqn. (8.27) by a = a 00 and substituting in Eqn. (8.26) 
yields 

o 2 n (1 + 2p) 

£ — 2 . N » 1 (8.30) 

NB^ 

Thus E ^E^^ for NB^ >> 1. However, since B^ can be quite small for small 

values of a t and 8x, N can be fairly large while NB^ , is still small. In 
these cases both terms of (8.22) must be considered and it is difficult to 
see how E and E compare. Table 1 shows how and E behave for 

values of ax and $x ranging from .01 to 10 for intermediate values of N. 
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It is interesting to note that the data differencing technique compares 
very closely with MV when WLS perforins poorly, and the data differencing 
technique does poorly when WLS compares closely with MV. Further analysis 
is required, however, to compare the MV, WLS, and data differencing techni- 
ques for more realistic problems. 


at 

( 3 t 

N 

£ 

MV 

z 

WLS 

£ 

Z/Z 

' WLS 








.01 

.001 

12 

.98 

1.0 

38 

38 

.01 

.01 

12 

.95 

1.0 

19 

19 

.01 

.1 

12 

.28 

.89 

.38 

.43 

.01 

.1 

24 

.15 

.63 

.18 

.28 

.1 

.01 

12 

.79 

.92 

3.8 

4.1 

.1 

.1 

12 

.66 

.85 

1.9 

2.2 

.1 

.1 

18 

.55 

.74 

1.2 

1.6 

.1 

1 

12 

.036 

.044 

.038 

.85 

1 

.1 

12 

.27 

.29 

.38 

1.3 

1 

1 

6 

.30 

.31 

.46 

1.5 

1 

1 

12 

.16 

.16 

.19 

1.2 

1 

1 

24 

.081 

.082 

.090 

1.1 

1 

10 

12 

.041 

.042 

.045 

1.1 


Table 1. Variances of MV, WLS, and Data Differencing 

Techniques for Different Values of ax, Bx, and N. 
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8.4 A GENERAL FORM OF THE DIFFERENCING METHOD 


The general technique of forming a normalized, uncorrelated set of 
measurement from k data points can be applied to other types of correlated 
data. Difference equations which the autocorrelation function of the data 
must satisfy can be determined and, if an autocorrelation function which 
satisfies these equations approximates that of the actual data, then this 
method may be used to satisfactorily uncorrelate the data. The following 
paragraphs show how to form an uncorrelated measurement set fr.om 3 data 
points and the extension to k data points is easily obtained. 

Form the new variable, q^, from the original data by 


q i 


= ( Z . - 


a.z. .a - b .z. -) 
l i-l i i-2 / 


K. 


1 = 3, 4, 


N 


(8.31) 


where 



E(zJ) = 1 . 


The requirements that 



if i * j 

if i = j 


allow a., b. and K. to be determined from 
i x x 

ECq^-i) = o , 

E(q i q i-2 ) = 0 * 

E(q^) =1 , i = 3, 4, .... N 


(8.32) 


(8.33) 

(8.34) 

(8.35) 
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and constrain the autocorrelation function to satisfy 


E(q^q^) =0 , j < i-2 , i = 4, 5, . .., N (8.36) 


Substituting Eqn. (8.31) into (8.33) - (3.35) and noting that E(q.z.) 

1 J 

for j < i yields 


= 0 


a. = (R. . - R. . R. . 0 )/(l - R. _ . „) 

i i,i-l l-l, i-2 i,i-2 i-I, i-2 


b. = (R. . , - R. . . , R. . .)/(l - R. . . .) 

l i,i-2 i-I, i-2 i,i-l i-I, i-2 


K. = 

l 


1 - a.R. . . - b.R. . „ 

i i,i-l l i,i-2 


1/2 


For equally spaced data, Eqns. (8.33) - (8.35) become 


a ± = a = R 1 (l - R 2 )/(l - R^) 


b ± = b = ( r 2 - R^)/(l - R \) 


K 


i = K = V (1 - R 2 )(l - 2 R^ + R 2 )/(l - R p 


where ^ ^ . Eqn. (8.36) yields the second order difference equation 


\ + 2 - a Vl - lR k ' 0 • 3 s k s N “ 2 


(8.37) 


where k = i - j. The solution of (8.37) is of the form 

\ = A i (r i) k + A 2 (r 2 )k ’ 3 * k * N " 2 (8.38) 

where 

r l = f + \/d~ , 

r 2 = f " v/d" , 

d = (|) 2 + b . 
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and can be chosen to make R^ and approximate the correlation of 
the real data. 

Thus, if the data is such that Eqn. (8.38) can be used to approximate 
the autocorrelation function, then the method described in this section can 
be used to uncorrelate the data by differencing three terms. 

8.5 TRANSFORMATION TO THE PROBLEM OF ESTIMATING AN UNKNOWN ACCELERATION 

By interpreting the correlated data noise as the output of a second- 
order linear system (Section 8.3), the problem posed in Section 8.2 can be 
transformed to that of estimating an unknown acceleration as described in 

Section 4. To do this, augment the state vector to include n 

♦ 

and n , the correlated noise and its first derivative at epoch t =0. 
o’ r o 

Then, the linearized equations of motion of the new system become 

X = F X + Hw (8.39) 


where 


X = (x T , n, n) , 



H = [o 0 1 ] T • 

The measurement equation then becomes 

z = [a B c] X Q + v (8.40) 
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where B = 9z/3n , C = 9 2 /Bn , and v is zero-mean white noise uncorrelated 

o o’ 

with n. Although not included in the problem stated in Section 8.2, the white 
noise v allows the problem to be formulated as in Section 4. 


8.6 CONCLUSIONS 


The preceding sections presented three methods of processing correlated 
data. Of these, the two data differencing techniques are most easily imple- 
mented in a WLS orbit determination program. Before they are actually used, 
however, a comparison with the conventional WLS method should be performed 
to determine whether their implementation is warranted. As Magness and 
McGuire (Reference [34]) and the results of Table 1 have shown, the WLS 
estimate is often quite close to the MV estimate, even in the presence of 
highly correlated data. 
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9. CONCLUSIONS AND RECOMMENDATIONS 


Orbit determination may be thought of as a hypothesis testing problem, 
in the sense that a statistical model of the data is postulated, an appro- 
priate estimator is derived, real data is processed, a "best-estimate" 
trajectory is calculated, and the data residuals are examined to see if they 
are consistent with the statistical hypothesis. If not, the model must be 
changed or a rationale for accepting the result must be devised. From the 
theoretical point of view, it is necessary to develop a firm mathematical 
basis for any estimation algorithm so that the fundamental validity of the 
estimate is not in question. From the practical point of view, the esti- 
mator must be feasible to implement. With these considerations in mind, 
this report has discussed some of the theoretical and practical questions 
arising in sequential processing of tracking data. 

In general it can be concluded that the maximum likelihood (weighted 
least squares) estimation technique, which has been much used in orbit 
determination work, is still the most practical algorithm for nonlinear 
orbit determination, sequential or otherwise. Several questions still 
remain to be investigated, however: 

(1) the separation of estimation and control - the discussion of 
Section 3 offers a rationale for the presently used approach, but counter- 
examples can be constructed to show that such a separation is not correct 
in general. This subject needs to be pursued further, in particular, non- 
hamiltonian systems should be studied. 

(2) the modal trajectory vs. the marginal mode - equations (3.24) and 
(3.28) show that two different forms of the maximum likelihood estimate can 
be constructed for nonlinear systems, whether control is present or not. 

This subject needs to be pursued further. 

(3) nonlinear error analysis - throughout this report various treat- 
ments of nonlinear error analysis have been presented. For example, in 
Section 3.8 it was suggested that the bias in the estimate can be found by 
solving a certain differential equation. Such methods should be studied 
further and tested. 

(4) estimation of unknown acceleration - this problem, which arises 
primarily in the treatment of model errors, has never been properly treated 
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in practical applications. It needs to be. 

There are, of course, many other aspects of estimation theory which 
need to be examined for particular applications, such as the treatment of 
numerical errors, systematic errors, and correlated data. In general, it 
is recommended that some of the approaches described in this report be 
tested by numerical simulation, so that experience can be gained to suggest 
efficient design of future orbit determination computer programs, and, if 
necessary, further research . 
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